1. 复旦大学 软件学院, 上海 201203;
2. 复旦大学 上海市智能信息处理重点实验室, 上海 201203

Text Style Transfer Based on Matrix Transformation
HUANG Ruo-Zi1,2, ZHANG Mi1,2
1. Software School, Fudan University, Shanghai 201203, China;
2. Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 201203, China
Abstract: Text style transfer is always a hot spot in Natural Language Processing (NLP). In recent years, as the development of sequence generation methods, many researchers focus on style transfer on non-parallel corpora. Specifically, this task wants to change the style of the sentence while keeping the original content. To achieve this target, many works have been proposed which based on the generative adversarial network. But due to the instability of adversarial training and the limitation of the independence assumption between the style and semantic information, these methods are hard to learn an effective and efficient transfer model. In this study, motivated by statistic learning methods, a definition of the text style is given. The style of the corpus can be captured by the covariance matrix of its sentences’ semantic vectors. From this perspective, the text style is dependent on all the semantic information. We then propose a learning free transfer method where the only thing we need is a pre-trained auto-encoder to produce the semantic vectors. With a pair of matrix transformations, including whitening transformation and stylizing transformation, performing on these vectors, we achieve text style transfer.
Key words: Natural Language Processing (NLP)     representation learning     text style transfer

 图 1 Yelp对应不同评分的文本子集的协方差矩阵

1 获取文本中的风格信息 1.1 句子的语义向量

(1)句嵌入应该是无损的、可以被重建的.

(2)不同文本集得到向量应该是可分的.

 $\left\{\begin{split} &{h_i} = LST{M_E}({x_i},{h_{i - 1}})\\ &{{{\textit{z}}}} = LST{M_E}({x_n},{h_{n - 1}}) \end{split}\right.$

 $\left\{\begin{array}{l}{s_1} = LST{M_D}({x_1},{{{\textit{z}}}})\\ \quad \vdots\\ {s_i} = LST{M_D}({x_i},{s_{i - 1}})\end{array}\right.$

 $p({y_i}|{s_i}) = Softmax({W_1}{s_i})$

 ${l_{\rm res}} = - \sum\limits_{{{x}} \in X} {\sum\limits_i {\log } } (p({x_i}|{s_i}))$

 $\left\{\begin{split} &p(t = 1|{{{\textit{z}}}}) = Sigmoid({W_2}f({{{\textit{z}}}}{{{{\textit{z}}}}^{\rm T}}))\\ &p(t = 0|{{{\textit{z}}}}) = 1 - p(t = 1|{{{\textit{z}}}}) \end{split}\right.$

 ${l_{\rm cls}} = - \sum\limits_{{{{{\textit{z}}}}_i}} {({t_i}\log (} p({t_i} = 1|{{{{\textit{z}}}}_i}))+ (1 - {t_i})\log (p(t = 0|{{{{\textit{z}}}}_i})))$

 $L = {l_{\rm res}} + \alpha {l_{\rm cls}}$

1.2 文本集的风格

 $S = Z{Z^{\rm T}}/(N - 1)$

2 一种无学习的风格迁移方法

 $S = P\Lambda {P^{\rm T}}$

ZCA白化: 白化变换会拆除向量各维度之间的相关性, 经过白化之后的向量协方差矩阵为单位矩阵:

 $Z' = {P_2}\Lambda _2^{ - \frac{1}{2}}P_2^{\rm T}\left(Z - {\hat {\textit{z}}_2}1_d^{\rm T}\right)$

 $Z'' = {P_1}\Lambda _1^{ {\textstyle\frac{1}{2}}}P_1^{\rm T}Z' + {\hat {\textit{z}}_1}1_d^{\rm T}$

 图 2 无学习的风格迁移方法

(1)预训练: 对于左侧情绪分别为正\负的两个文本集, 利用自编码器进行重建, 同时在隐层语义空间中利用一个分类器半监督的训练, 从而调整该空间的分布.

(2)获取风格表示: 预训练收敛以后, 用得到的语义向量, 得到两个文本集的语义协方差矩阵.

(3)风格迁移: 利用白化-风格化变换算子将两个文本集的语义向量进行风格迁移, 迁移后的向量利用已经训练好的解码器进行解码.

3 实验 3.1 实验设置

CrossAligned[3]: 该模型假设不同风格的文本集存在一个共享的、与风格无关的语义空间, 该模型通过对抗的训练来对齐不同文本集在这个空间的分布, 以达到去除风格信息的目的.

StyleEmbedding[2]: 该模型显式地学习了不同风格的嵌入, 将风格嵌入和语义向量一起作为解码器的输入, 从而对于多种风格, 只需要一个自编码器.

Accuracy: 为了评估生成的文本是否符合预期的风格, 我们首先在训练集上预训练了一个文本风格的分类器, 该分类器使用TextCNN模型[11], 在测试集上的分类准确率可达到97.23%. 我们用该分类器对迁移之后的文本的分类准确率作为评估指标, 也就是说, 迁移之后的文本越多可以“骗过”风格分类器, 在这一指标的表现越好.

BLEU: 为了评估生成的文本在改变了风格的同时是否保留了源文本的内容信息, 我们以源文本为参考文本计算了累积4-gramBLEU值. BLEU越高, 意味着和源文本更加相似.

3.2 实验结果

 图 3 实验结果

4 结语

