计算机系统应用  2018, Vol. 27 Issue (9): 157-162 PDF

1. 华东师范大学 计算机科学与软件工程学院, 上海 200062;
2. 上海智臻智能网络科技股份有限公司, 上海 201803

Question Categorization of Community Question Answering by Combining Bi-LSTM and CNN with Attention Mechanism
SHI Meng-Fei1, YANG Yan1, HE Liang1, CHEN Cheng-Cai2
1. School of Computer Science and Software Engineering, East China Normal University, Shanghai 200062, China;
2. Xiaoi Robot Technology Co. Ltd., Shanghai 201803, China
Foundation item: Project of Economic and Information Committee of Shanghai Municipality (201602024); Project of Municipal Science and Technology Committee of Shanghai Municipality (14DZ2260800)
Abstract: The goal of question categorization is to classify natural language questions that user raised into predefined categories. How to classify question sentences accurately and efficiently is an important task in community question answering. In this study, we propose a question categorization method based on deep neural network. Firstly, the words of the question are transformed to vectors. Then, we use a novel Bidirectional Long Short-Term Memory (Bi-LSTM) based Convolutional Neural Network (CNN) model with attention mechanism to capture the most important features in a question. Finally, the features are fed into the classifier to predict the category of the question. We use the Bi-LSTM and CNN to capture the features of question because of their benefits in representing sentence level documents. We also use the answer set to enrich the information of the question. The experimental results on several datasets demonstrate the effectiveness of the proposed approach.
Key words: question classification     answer set     attention mechanism     deep neural network

1 相关工作

Hui等人[5]在进行问句分类时考虑到问题文本中词序和词间距的因素, 提出了一种扩展类规则模型; Mishra等人[6]根据从问题文本中抽取的词特征、语义特征和句法特征来训练不同的分类器(朴素贝叶斯、最近邻、支持向量机)进行问题的分类; Aikawa等人[7] 根据用户的主观和客观臆想, 将问题分为主客观两类并利用平滑的朴素贝叶斯方法进行问题分类. Liu等人[8]在SVM的基础上提出了一种依赖句法关系和词性特征的核函数方法. 杨思春等人[9]为了解决问句分类研究中特征提取开销过大的问题, 提出了一种包含基本特征和词袋绑定特征的问句特征模型, 以此来获取更加有效的问句特征集.

2 问句分类方法

2.1 词向量层

 ${e_i} = {E_w}{v^i}$ (1)

 图 1 基于深度神经网络的问句分类方法架构图

2.2 卷积层

 ${h_{0:n - 1}} = {h_0} \oplus {h_1} \oplus \cdots \oplus {h_{n - 1}}$ (2)

 ${c_i} = f(w \cdot {h_{i:i + m - 1}} + b)$ (3)

 ${c^*} = [{c_0},{c_1},\cdots,{c_{n - m}}]$ (4)
2.3 双向长短时记忆网络层

 ${i_t} = \sigma ({{\rm W}_{xi}}{x_t} + {{\rm W}_{hi}}{h_{t - 1}} + {{\rm W}_{ci}}{c_{t - 1}} + {b_i})$ (5)
 ${f_t} = \sigma ({{\rm W}_{xf}}{x_t} + {{\rm W}_{hf}}{h_{t - 1}} + {{\rm W}_{cf}}{c_{t - 1}} + {b_f})$ (6)
 ${g_t} = \tanh ({{\rm W}_{xc}}{x_t} + {{\rm W}_{hf}}{h_{t - 1}} + {{\rm W}_{cc}}{c_{t - 1}} + {b_c})$ (7)
 ${c_t} = {i_t}{g_t} + {f_t}{c_{t - 1}}$ (8)
 ${o_t} = \sigma ({{\rm W}_{xo}}{x_t} + {{\rm W}_{ho}}{h_{t - 1}} + {{\rm W}_{co}}{c_t} + {b_o})$ (9)
 ${h_t} = {o_t}\tanh({c_t})$ (10)

 ${h_i} = [\overrightarrow {{h_i}} \oplus \overleftarrow {{h_i}} ]$ (11)

2.4 注意力机制层

 $M = \tanh(H)$ (12)
 $\alpha = softmax({w^n}M)$ (13)
 $r = H{\alpha ^n}$ (14)

 ${c^*} = \tanh(r)$ (15)
2.5 分类器

 $\widehat p(y|Q) = softmax({W^{(Q)}}{c^*} + {b^{(Q)}})$ (16)
 $\widehat y = argmax\widehat p(y|Q)$ (17)

 $J(\theta ) = - \frac{1}{m}\sum\limits_{i = 1}^m {{t_i}\log({y_i}) + \lambda \left\| \theta \right\|_F^2}$ (18)

2.6 正则化

3 实验分析 3.1 数据集

(1) TREC: TREC问句集包含一系列事实类问句, 遵循广泛应用的UIUC英文问句分类体系, 问句分为6个大类(ABBR, DESC, ENTY, HUM, LOC, NUM)[1], 50个小类, 每个大类会包含不同的小类. 选择这个数据集是因为该数据集比较经典, 适用广泛, 能较好的证明方法的性能.

(2) YahooAns: YahooAns数据集是从雅虎问答社区上搜集下来的一批问句集并带有答案信息, 并且通过人工审核校验. 该数据集主要包含如下4个类别: “information”、“advice”、“opinion”和“polling”.

(3) CQA dataset: CQA dataset是从百度知道和360问答中抓取的问句组成的数据集并带有答案信息. 所有选取的问句被分为3类, 分别为: 电脑网络、体育运动、地区.

3.2 实验设置 3.2.1 参数设置

3.2.2 评价指标

 $Acc = \frac{{AccNum}}{{TNum}}$ (19)
 $RMSE = \sqrt {\frac{{\sum\nolimits_{i = 1}^N {{{\left( {pi - gi} \right)}^2}} }}{{TNum}}}$ (20)

3.3 实验结果

4 结论与展望

