计算机系统应用  2020, Vol. 29 Issue (6): 163-168 PDF

Target-Specific Sentiment Analysis Based on Multi-Attention Network
SONG Shu-Guang, XU Ying-Xiao
School of Computer Science, Fudan University, Shanghai 201203, China
Abstract: As one of the classic research directions in the field of natural language processing, the task of target-specific sentiment analysis is to determine the sentiment polarity of a specific target based on contexts. The key to improve the performance of this task is how to better mine the semantic representation of specific target and contexts. This study proposes a multi-attention network with phrase features. By introducing phrase-level semantic features, a multi-attention network with multi-granularity features is constructed to improve the expression ability of the model effectively. The experimental results on the SemEval2014 Task4 Laptop and Restaurant datasets show that the PEMAN model proposed in this study has a certain improvement in accuracy compared with the benchmark model.
Key words: sentiment analysis     attention machanism     natural language processing

1 引言

2 相关工作

3 PEMAN模型

(1)输入层: 对模型输入部分做处理, 进行向量嵌入操作.

(2)编码层: 使用Bi-LSTM[11]对输入内容进行编码, 并嵌入位置信息.

(3)多注意力层: 使用两个注意力交互矩阵对隐层状态输出进行计算, 得到最终的语义表示.

(4)输出层: 使用多注意力层输出的语义表示进行情感分类.

3.1 输入层

 $C = [W_c^1;W_c^2;\cdots;W_c^n] \in {R^{n \times d}}$
 $A = [W_a^1;W_a^2;\cdots;W_a^m] \in {R^{m \times d}}$

 $\begin{array}{l} {{{p}}_i} = max [C[i:i + x]],\;{{i = 1,2,}}\cdots{{,n - x + 1}} \end{array}$ (1)

 图 1 PEMAN模型结构图

3.2 编码层

 $\overrightarrow {{h_c}} = \overrightarrow {LSTM} ([W_c^1;W_c^2;\cdots;W_c^n])$ (2)
 $\overleftarrow {{h_c}} = \overleftarrow {LSTM} ([W_c^1;W_c^2;\cdots;W_c^n])$ (3)

 ${h_c} = [\overrightarrow {{h_c}} ,\overleftarrow {{h_c}} ]$ (4)

 ${h_a} = [\overrightarrow {{h_a}} ,\overleftarrow {{h_a}} ]$ (5)
 ${h_p} = [\overrightarrow {{h_p}} ,\overleftarrow {{h_p}} ]$ (6)

 ${v_t} = 1 - \frac{l}{{n - m + 1}}$ (7)

 ${h_c} = [h_c^1 * {v_1},h_c^2 * {v_2},\cdots,h_c^n * {v_n}]$ (8)
3.3 多注意力层

 ${I^{ctx}} = {h_c} \cdot h_a^{\rm T}$ (9)

 $\alpha _{ij}^{ctx} = \frac{{\exp (I_{ij}^{ctx})}}{{\displaystyle\sum\nolimits_i {\exp (I_{ij}^{ctx})} }},\beta _{ij}^{ctx} = \frac{{\exp (I_{ij}^{ctx})}}{{\displaystyle\sum\nolimits_j {\exp (I_{ij}^{ctx})} }}$ (10)
 $\beta _j^{ctx} = \frac{1}{n}\sum\nolimits_i {\beta _{ij}^{ctx}}$ (11)

 ${\gamma ^{ctx}} = {\alpha ^{ctx}} \cdot {\overline {{\beta ^{ctx}}} ^{\rm T}}$ (12)

 ${I^{prs}} = {h_p} \cdot h_a^{\rm T}$ (13)
 $\alpha _{ij}^{prs} = \frac{{\exp (I_{ij}^{prs})}}{{\displaystyle\sum\nolimits_i {\exp (I_{ij}^{prs})} }},\beta _{ij}^{prs} = \frac{{\exp (I_{ij}^{prs})}}{{\displaystyle\sum\nolimits_j {\exp (I_{ij}^{prs})} }}$ (14)
 $\beta _j^{prs} = \frac{1}{n}\sum\nolimits_i {\beta _{ij}^{prs}}$ (15)
 ${\gamma ^{prs}} = {\alpha ^{prs}} \cdot {\overline {{\beta ^{prs}}} ^{\rm T}}$ (16)
3.4 输出层

 ${r_{ctx}} = h_c^{\rm T} \cdot {\gamma ^{ctx}}$ (17)
 ${r_{prs}} = h_p^{\rm T} \cdot {\gamma ^{prs}}$ (18)

 $r = [{r_{ctx}};{r_{prs}}]$ (19)
 $y = L_{\rm Softmax} ({W_l} * r + {b_l})$ (20)

 $loss = \sum\nolimits_k {\sum\nolimits_{i \in C} {y_i^g \cdot \log ({y_i})} } + \lambda ||\theta |{|^2}$ (21)

4 实验与分析 4.1 数据集

4.2 超参数设置

4.3 结果讨论

4.4 样本分析

 图 2 AOA[10]和PEMAN模型中句子的注意力权重分布

5 总结与展望

 [1] Hochreiter S, Schmidhuber J. LSTM can solve hard long time lag problems. Proceedings of the 9th International Conference on Neural Information Processing Systems. Cambridge, UK. 1996. 473–479. [2] Pontiki M, Galanis D, Pavlopoulos J, et al. SemEval-2014 task 4: Aspect based sentiment analysis. Proceedings of the 8th International Workshop on Semantic Evaluation. Dublin, Ireland. 2014. 27–35. [3] Vo DT, Zhang Y. Target-dependent twitter sentiment classification with rich automatic features. Proceedings of the 24th International Conference on Artificial Intelligence. Denver, CO, USA. 2015. 1347–1353. [4] Kiritchenko S, Zhu XD, Cherry C, et al. NRC-Canada-2014: Detecting aspects and sentiment in customer reviews. Proceedings of the 8th International Workshop on Semantic Evaluation. Dublin, Ireland. 2014. 437–442. [5] Tang DY, Qin B, Feng XC, et al. Effective LSTMs for target-dependent sentiment classification. arXiv: 1512.01100, 2015. [6] Wang YQ, Huang ML, Zhao L, et al. Attention-based LSTM for aspect-level sentiment classification. Proceedings of 2016 Conference on Empirical Methods in Natural Language Processing. Austin, TX, USA. 2016. 606–615. [7] Tang DY, Qin B, Liu T. Aspect level sentiment classification with deep memory network. arXiv: 1605.08900, 2016. [8] Ma DH, Li SJ, Zhang XD, et al. Interactive attention networks for aspect-level sentiment classification. arXiv: 1709.00893, 2017. [9] Chen P, Sun ZQ, Bing LD, et al. Recurrent attention network on memory for aspect sentiment analysis. Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark. 2017. 452–461. [10] Huang BX, Ou YL, Carley KM. Aspect level sentiment classification with attention-over-attention neural networks. Proceedings of the 11th International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Washington, WA, USA. 2018. 197–206. [11] Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 2005, 18(5–6): 602-610. [12] Pennington J, Socher R, Manning C. Glove: Global vectors for word representation. Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar. 2014. 1532–1543. [13] Deng LY. The cross-entropy method: A unified approach to combinatorial optimization, Monte-Carlo simulation, and machine learning. Technometrics, 2006, 48(1): 147-148. DOI:10.1198/tech.2006.s353 [14] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille, France. 2015. 448–456. [15] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 2014, 15(56): 1929-1958. [16] Bottou L. Large-scale machine learning with stochastic gradient descent. Proceedings of the 19th International Conference on Computational Statistics. Paris, France. 2010. 177–186.