﻿ 基于二分网中心节点识别的产品评论特征-观点词对提取研究
 计算机系统应用  2018, Vol. 27 Issue (11): 9-16 PDF

Research on Product Feature-Opinion Extraction Based on Center Node Recognition in Bipartite Network
LIU Chen, JI Li, TANG Li
Business School, University of Shanghai for Science and Technology, Shanghai 200093, China
Foundation item: National Natural Science Foundation of China (71401107)
Abstract: This study takes the product review texts on the e-commerce platform as the mining object, and focuses on the identification of feature words and opinion words in reviews. First, we build bipartite network with feature–opinion words, and give the sorting algorithm of node importance in this network. At last, the algorithm is applied to the actual review text data to verify the effectiveness of the algorithm.
Key words: feature-opinion extraction     bipartite network     center node recognition     product review

1 引言

2 特征-观点对二分网络的构建 2.1 特征-观点对二分网络的表示

 图 1 特征-观点对二分网络图 Fig. 1 Bipartite network with feature-opinion words

2.2 特征-观点对二分网络中的度和点权

 ${k_i} = \sum\limits_{j = 1}^N {{a_{ij}}} = \sum\limits_{j = 1}^N {{a_{ji}}}$ (1)

 $A = {({a_{ij}})_{N \times N}}$ (2)

 ${S_i} = \sum\limits_{j \in {N_i}} {{w_{ij}}}$ (3)

3 特征-观点对提取

3.1 B-核分解算法

CFO: 候选特征观点词集.

B: 无权特征-观点对二分网络. $i$ 表示网络中的节点.

Ranking set: 新特征观点词排序集.

Step 1: Input: CFO

Step 2: 构建网络B

Step 3: For iin B:

E is empty set

$\scriptstyle{b_{\min }} = \min \_\deg {\rm {ree}}(B)$

If $\scriptstyle i$ is feature:

If $\scriptstyle i.\deg {\rm {ree}} \leqslant {b_{\min }}$ :

$\scriptstyle i$ is inserted into E

If $\scriptstyle i$ is opinion:

If $\scriptstyle i.\deg {\rm {ree}} \leqslant {b_{\min }}$ :

$\scriptstyle i$ is inserted into E

E is inserted into Ranking set

E is deleted

Update B

Every node are recalculated

Step 4: Output: Ranking set

 图 2 无权特征-观点对二分网络图 Fig. 2 Unweighted bipartite network with feature-opinion words

 图 3 节点重要性排序图 Fig. 3 Nodes importance sorting

3.2 BW-核分解算法

CFO: 候选特征观点词集.

B: 加权特征-观点对二分网络. $i$ 表示网络中的节点.

Ranking set: 新特征观点词排序集.

Step 1: Input: CFO

Step 2: 构建网络B

Step 3: For i in B:

E is empty set

$\scriptstyle b{w_{\min }} = \min \_weight(B)$

$\scriptstyle a \geqslant b{w_{\min }}$

If $\scriptstyle i$ is feature:

If $\scriptstyle i.weight \leqslant b{w_{\min }}$ :

$i$ is inserted into E

If $\scriptstyle i$ is opinion:

If $\scriptstyle i.weight \leqslant b{w_{\min }}$ :

$\scriptstyle i$ is inserted into E

E is inserted into Ranking set

E is deleted

Update B

Every node weights are

recalculated

Step 4: Output: Ranking set

 图 4 加权特征-观点对二分网络图 Fig. 4 Weighted bipartite network with feature-opinion words

 图 5 节点重要性排序图 Fig. 5 Nodes importance sorting

4 实验

4.1 实验数据集

 图 6 句法分析结果 Fig. 6 Syntactic analysis result

4.3 实验结果

 $P= x/(x + y)$ (4)
 $R = x/(x + z)$ (5)
 $F = (2 \times R \times P)/(R + P)$ (6)

 图 7 特征节点度分布 Fig. 7 Degree distribution of feature nodes

 图 8 观点词节点度分布 Fig. 8 Degree distribution of opinion nodes

 图 9 无权二分网络P、R、F值分布 Fig. 9 Value distribution of P、R、F in Unweighted bipartite network

 图 10 加权二分网络P、R、F值分布 Fig. 10 Value distribution of P、R、F in Weighted bipartite network

5 结论

 [1] Zhao WX, Jiang J, Yan HF, et al. Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid. Proceedings of 2010 Conference on Empirical Methods in Natural Language Processing. Cambridge, MA, USA. 2010. 56–65. [2] Hu MQ, Liu B. Mining and summarizing customer reviews. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle, WA, USA. 2004. 168–177. [3] Popescu AM, Etzioni O. Extracting product features and opinions from reviews. In: Kao A, Poteet SR, eds. Natural Language Processing and Text Mining. London, UK. Springer. 2007. 9–28. [4] 李实, 叶强, 李一军, 等. 中文网络客户评论的产品特征挖掘方法研究. 理科学学报, 2009, 12(2): 142-152. DOI:10.3321/j.issn:1007-9807.2009.02.015 [5] 马柏樟, 颜志军. 基于潜在狄利特雷分布模型的网络评论产品特征抽取方法. 计算机集成制造系统, 2014, 20(1): 96-103. [6] Qiu G, Liu B, Bu JJ, et al. Opinion word expansion and target extraction through double propagation. Computational Linguistics, 2011, 37(1): 9-27. DOI:10.1162/coli_a_00034 [7] Hai Z, Chang KY, Cong G, et al. An association-based unified framework for mining features and opinion words. ACM Transactions on Intelligent Systems and Technology, 2015, 6(2): 26. [8] 孙晓, 唐陈意. 基于层叠模型细粒度情感要素抽取及倾向分析. 模式识别与人工智能, 2015, 28(6): 513-520. [9] 刘臣, 韩林, 李丹丹, 等. 基于汉语组块产品特征——观点对提取与情感分析研究. 计算机应用研究, 2017, 34(10): 2942-2945. DOI:10.3969/j.issn.1001-3695.2017.10.014 [10] 刘通, 张聪, 吴鸣远. 在线评论中基于边界平均信息熵的产品特征提取算法. 系统工程理论与实践, 2016, 36(9): 2416-2423. [11] Jin W, Ho HH. A novel lexicalized hmm-based learning framework for web opinion mining. Proceedings of the 26th Annual International Conference on Machine Learning. Montreal, Quebec, Canada. 2009. 465–472. [12] 李志义, 王冕, 赵鹏武. 基于条件随机场模型的" 评价特征-评价词”对抽取研究. 情报学报, 2017, 36(4): 411-421. DOI:10.3772/j.issn.1000-0135.2017.04.010 [13] Titov I, McDonald R. Modeling online reviews with multi-grain topic models. Proceedings of the 17th International Conference on World Wide Web. Beijing, China. 2008. 111–120. [14] 彭云, 万常选, 江腾蛟, 等. 基于语义约束LDA的商品特征和情感词提取. 软件学报, 2017, 28(3): 676-693. DOI:10.13328/j.cnki.jos.005154 [15] Kamal A, Abulaish M, Anwar T. Mining feature-opinion pairs and their reliability scores from web opinion sources. Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics. Craiova, Romania. 2012. 15. [16] 郗亚辉. 产品评论特征及观点抽取研究. 情报学报, 2014, 33(3): 326-336. DOI:10.3772/j.issn.1000-0135.2014.03.011 [17] Liu K, Xu LH, Zhao J. Extracting opinion targets and opinion words from online reviews with graph co-ranking. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, MD, USA. 2014. 314–324. [18] Zhang L, Liu B, Lim SH, et al. Extracting and ranking product features in opinion documents. Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Beijing, China. 2010. 1462–1470. [19] 吴亚晶, 张鹏, 狄增如, 等. 二分网络研究. 复杂系统与复杂性科学, 2010, 7(1): 1-12. DOI:10.3969/j.issn.1672-3813.2010.01.001