###
计算机系统应用英文版:2021,30(4):175-180
本文二维码信息
码上扫一扫!
基于正负样本和Bi-LSTM的文本相似度匹配模型
(青岛科技大学 信息科学技术学院, 青岛 266061)
Text Similarity Matching Model Based on Positive and Negative Samples and Bi-LSTM
(College of Information Science and Technology, Qingdao University of Science & Technology, Qingdao 266061, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 871次   下载 1810
Received:July 30, 2020    Revised:August 26, 2020
中文摘要: 相似度匹配是自然语言处理领域一个重要分支, 也是问答系统抽取答案的重要途径之一. 本文提出了一种基于正负样本和Bi-LSTM的文本相似度匹配模型, 该模型首先为了提升问题和正确答案之间的相似度, 构建正负样本问答对用于模型训练; 其次为了解决分词错误引起的实验误差, 采用双层嵌入词向量方法进行预训练; 再次为了解决注意力机制导致的特征向量向后偏移的问题, 在特征提取之前, 采取内部注意力机制方法; 然后为了保留重要的时序特性, 采用Bi-LSTM神经网络进行数据训练; 最后为了能在语义层次上计算相似度, 提出一种包含语义信息的相似度计算函数. 将本文提出的文本相似度匹配模型在公共数据集DuReader上进行了仿真实验, 并和其他模型进行对比分析, 实验结果表明, 提出的模型不仅准确率高且鲁棒性好, top-1准确率达到78.34%.
中文关键词: 问答系统  相似度匹配  正负样本  Bi-LSTM
Abstract:Similarity matching is crucial for natural language processing and also for extracting answers from the question answering system. This study proposes a model of text similarity matching based on positive and negative samples and Bi-LSTM. Firstly, this model constructs question answering pairs for positive and negative samples in model training, improving the similarity between a question and its correct answer. Secondly, it applies the dual-layer word vector embedding for pre-training to solve the experimental error caused by segmentation mistakes. Thirdly, it adopts the internal attention mechanism before feature extraction to solve the backward offset of the characteristic vectors caused by the attention mechanism. Then this model trains the data on the Bi-LSTM neural network to retain important temporal characteristics. Finally, it puts forward a similarity calculation function including semantic information to calculate similarity at the semantic level. The model proposed in this study is simulated on the public data set DuReader and compared with other models. The experimental results show that the proposed model has high accuracy and good robustness, and the accuracy of top-1 reaches 78.34%.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61402246); 山东省高等学校科技计划(J14LN31)
引用文本:
周艳平,朱小虎.基于正负样本和Bi-LSTM的文本相似度匹配模型.计算机系统应用,2021,30(4):175-180
ZHOU Yan-Ping,ZHU Xiao-Hu.Text Similarity Matching Model Based on Positive and Negative Samples and Bi-LSTM.COMPUTER SYSTEMS APPLICATIONS,2021,30(4):175-180