基于孪生网络和字词向量结合的文本相似度匹配

doi:10.15888/j.cnki.csa.008756

微信公众号

网站二维码

首页 > 过刊浏览>2022年第31卷第10期 >295-302. DOI:10.15888/j.cnki.csa.008756

PDF HTML阅读 XML下载导出引用引用提醒

基于孪生网络和字词向量结合的文本相似度匹配
DOI:
                        10.15888/j.cnki.csa.008756
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

Similar Text Matching Based on Siamese Network and Char-word Vector Combination

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

文本相似度匹配是许多自然语言处理任务的基础, 本文提出一种基于孪生网络和字词向量结合的文本相似度匹配方法, 采用孪生网络的思想对文本整体建模, 实现两个文本的相似性判断. 首先, 在提取文本特征向量时, 使用BERT和WoBERT模型分别提取字和词级别的句向量, 将二者结合使句向量具有更丰富的文本语义信息; 其次, 针对特征信息融合过程中出现的维度过大问题, 加入PCA算法对高维向量进行降维, 去除冗余信息和噪声干扰; 最后, 通过Softmax分类器得到相似度匹配结果. 通过在LCQMC数据集上的实验表明, 本文模型的准确率和F1值分别达到了89.92%和88.52%, 可以更好地提取文本语义信息, 更适合文本相似度匹配任务.

Abstract:

Text similarity matching is the basis of many natural language processing tasks. This study proposes a text similarity matching method based on a Siamese network and char-word vector combination. The method adopts the idea of the Siamese network to model the overall text so that the text similarity can be determined. First, when text feature vectors are extracted, BERT and WoBERT models are used to extract character-level and word-level sentence vectors which are then combined to have richer text semantic information. If the dimension is too large during feature information fusion, the principal component analysis (PCA) algorithm is employed for the dimension reduction of high-dimensional vectors to remove the interference of redundant information and noise. Finally, the similarity matching result is obtained through the Softmax classifier. The experimental results on the LCQMC dataset show that the accuracy and F1 score of the model in this study reach 89.92% and 88.52%, respectively, which can better extract text semantic information and is more suitable for text similarity matching tasks.

参考文献

相似文献

引证文献

引用本文

李奕霖,周艳平.基于孪生网络和字词向量结合的文本相似度匹配.计算机系统应用,2022,31(10):295-302

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-01-21
最后修改日期:2022-02-22
录用日期:
在线发布日期: 2022-06-24
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码