计算机系统应用  2018, Vol. 27 Issue (9): 157-162 PDF

1. 华东师范大学 计算机科学与软件工程学院, 上海 200062;
2. 上海智臻智能网络科技股份有限公司, 上海 201803

Question Categorization of Community Question Answering by Combining Bi-LSTM and CNN with Attention Mechanism
SHI Meng-Fei1, YANG Yan1, HE Liang1, CHEN Cheng-Cai2
1. School of Computer Science and Software Engineering, East China Normal University, Shanghai 200062, China;
2. Xiaoi Robot Technology Co. Ltd., Shanghai 201803, China
Foundation item: Project of Economic and Information Committee of Shanghai Municipality (201602024); Project of Municipal Science and Technology Committee of Shanghai Municipality (14DZ2260800)
Abstract: The goal of question categorization is to classify natural language questions that user raised into predefined categories. How to classify question sentences accurately and efficiently is an important task in community question answering. In this study, we propose a question categorization method based on deep neural network. Firstly, the words of the question are transformed to vectors. Then, we use a novel Bidirectional Long Short-Term Memory (Bi-LSTM) based Convolutional Neural Network (CNN) model with attention mechanism to capture the most important features in a question. Finally, the features are fed into the classifier to predict the category of the question. We use the Bi-LSTM and CNN to capture the features of question because of their benefits in representing sentence level documents. We also use the answer set to enrich the information of the question. The experimental results on several datasets demonstrate the effectiveness of the proposed approach.
Key words: question classification     answer set     attention mechanism     deep neural network

1 相关工作

Hui等人[5]在进行问句分类时考虑到问题文本中词序和词间距的因素, 提出了一种扩展类规则模型; Mishra等人[6]根据从问题文本中抽取的词特征、语义特征和句法特征来训练不同的分类器(朴素贝叶斯、最近邻、支持向量机)进行问题的分类; Aikawa等人[7] 根据用户的主观和客观臆想, 将问题分为主客观两类并利用平滑的朴素贝叶斯方法进行问题分类. Liu等人[8]在SVM的基础上提出了一种依赖句法关系和词性特征的核函数方法. 杨思春等人[9]为了解决问句分类研究中特征提取开销过大的问题, 提出了一种包含基本特征和词袋绑定特征的问句特征模型, 以此来获取更加有效的问句特征集.

2 问句分类方法

2.1 词向量层

 ${e_i} = {E_w}{v^i}$ (1)

 图 1 基于深度神经网络的问句分类方法架构图

2.2 卷积层

 ${h_{0:n - 1}} = {h_0} \oplus {h_1} \oplus \cdots \oplus {h_{n - 1}}$ (2)

 ${c_i} = f(w \cdot {h_{i:i + m - 1}} + b)$ (3)

 ${c^*} = [{c_0},{c_1},\cdots,{c_{n - m}}]$ (4)
2.3 双向长短时记忆网络层

 ${i_t} = \sigma ({{\rm W}_{xi}}{x_t} + {{\rm W}_{hi}}{h_{t - 1}} + {{\rm W}_{ci}}{c_{t - 1}} + {b_i})$ (5)
 ${f_t} = \sigma ({{\rm W}_{xf}}{x_t} + {{\rm W}_{hf}}{h_{t - 1}} + {{\rm W}_{cf}}{c_{t - 1}} + {b_f})$ (6)
 ${g_t} = \tanh ({{\rm W}_{xc}}{x_t} + {{\rm W}_{hf}}{h_{t - 1}} + {{\rm W}_{cc}}{c_{t - 1}} + {b_c})$ (7)
 ${c_t} = {i_t}{g_t} + {f_t}{c_{t - 1}}$ (8)
 ${o_t} = \sigma ({{\rm W}_{xo}}{x_t} + {{\rm W}_{ho}}{h_{t - 1}} + {{\rm W}_{co}}{c_t} + {b_o})$ (9)
 ${h_t} = {o_t}\tanh({c_t})$ (10)

 ${h_i} = [\overrightarrow {{h_i}} \oplus \overleftarrow {{h_i}} ]$ (11)

2.4 注意力机制层

 $M = \tanh(H)$ (12)
 $\alpha = softmax({w^n}M)$ (13)
 $r = H{\alpha ^n}$ (14)

 ${c^*} = \tanh(r)$ (15)
2.5 分类器

 $\widehat p(y|Q) = softmax({W^{(Q)}}{c^*} + {b^{(Q)}})$ (16)
 $\widehat y = argmax\widehat p(y|Q)$ (17)

 $J(\theta ) = - \frac{1}{m}\sum\limits_{i = 1}^m {{t_i}\log({y_i}) + \lambda \left\| \theta \right\|_F^2}$ (18)

2.6 正则化

3 实验分析 3.1 数据集

(1) TREC: TREC问句集包含一系列事实类问句, 遵循广泛应用的UIUC英文问句分类体系, 问句分为6个大类(ABBR, DESC, ENTY, HUM, LOC, NUM)[1], 50个小类, 每个大类会包含不同的小类. 选择这个数据集是因为该数据集比较经典, 适用广泛, 能较好的证明方法的性能.

(2) YahooAns: YahooAns数据集是从雅虎问答社区上搜集下来的一批问句集并带有答案信息, 并且通过人工审核校验. 该数据集主要包含如下4个类别: “information”、“advice”、“opinion”和“polling”.

(3) CQA dataset: CQA dataset是从百度知道和360问答中抓取的问句组成的数据集并带有答案信息. 所有选取的问句被分为3类, 分别为: 电脑网络、体育运动、地区.

3.2 实验设置 3.2.1 参数设置

3.2.2 评价指标

 $Acc = \frac{{AccNum}}{{TNum}}$ (19)
 $RMSE = \sqrt {\frac{{\sum\nolimits_{i = 1}^N {{{\left( {pi - gi} \right)}^2}} }}{{TNum}}}$ (20)

3.3 实验结果

4 结论与展望

 [1] Li X, Roth D. Learning question classifiers. Proceedings of the 19th International Conference on Computational Linguistics. Taipei, China. 2002. 1–7. [doi: 10.3115/1072228.1072378] [2] 镇丽华, 王小林, 杨思春. 自动问答系统中问句分类研究综述. 安徽工业大学学报(自然科学版), 2015, 32(1): 48-54, 66. DOI:10.3969/j.issn.1671-7872.2015.01.010 [3] Shen D, Pan R, Sun JT, et al. Query enrichment for web-query classification. ACM Transactions on Information Systems, 2006, 24(3): 320-352. DOI:10.1145/1165774 [4] Broder A, Fontoura M, Gabrilovich E, et al. Robust classification of rare queries using Web knowledge. Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07). Amsterdam, Holland. 2007. 231–238. [doi: 10.1145/1277741.1277783] [5] Hui ZJ, Liu J, Ouyang LM. Question classifiaction based on an extend class sequential rule model. Proceedings of the 5th International Joint Conference on Natural Language Processing. Chiang Mai, Thailand. 2011. 938–946. [6] Mishra M, Mishra VK, Sharma HR. Question classification using semantic, syntactic and lexical features. International Journal of Web & Semantic Technology, 2013, 4(3): 39-47. [7] Aikawa N, Sakai T, Yamana H. Community QA question classification: Is the asker looking for subjective answers or not?. IPSJ Online Transactions, 2011(4): 160-168. DOI:10.2197/ipsjtrans.4.160 [8] Liu L, Yu ZT, Guo JY, et al. Chinese question classification based on question property kernel. International Journal of Machine Learning and Cybernetics, 2014, 5(5): 713-720. DOI:10.1007/s13042-013-0216-y [9] 杨思春, 高超, 秦锋, 等. 融合基本特征和词袋绑定特征的问句特征模型. 中文信息学报, 2012, 26(5): 46-52. DOI:10.3969/j.issn.1003-0077.2012.05.008 [10] 王艳娜, 孙丙宇. 基于卷积神经网络的烟瘾渴求脑电分类. 计算机系统应用, 2017, 26(6): 254-258. [11] Wei YC, Zhao Y, Lu CY, et al. Cross-modal retrieval with CNN Visual features: A new baseline. IEEE Transactions on Cybernetics, 2017, 47(2): 449-460. DOI:10.1109/TCYB.2016.2519449 [12] Hao YC, Zhang YZ, Liu K, et al. An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada. 2017. 221–231. [doi: 10.18653/v1/P17-1021] [13] Kim Y. Convolutional neural networks for sentence classification.  arXiv  eprintarXiv: 1408.5882. [14] Shi YY, Yao KS, Tian L, et al. Deep LSTM based feature mapping for query classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, CA. 2016. 1501–1511. [doi: 10.18653/v1/N16-1176] [15] Graves A, Mohamed AR, Hinton G. Speech recognition with deep recurrent neural networks. Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada. 2013. 6645–6649. [doi: 10.1109/ICASSP.2013.6638947] [16] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv eprint arXiv:1409.0473, 2014.