###
计算机系统应用英文版:2023,32(5):291-299
本文二维码信息
码上扫一扫!
基于Transformer与HowNet义原知识融合的双驱动语义蕴含识别
(1.华中科技大学 机械科学与工程学院, 武汉 430074;2.华中科技大学 人工智能与自动化学院, 武汉 430074)
Co-driven Recognition of Semantic Entailment Based on Fusion of Transformer and HowNet Sememe Knowledge
(1.School of Mechanical Science & Engineering, Huazhong University of Science and Technology, Wuhan 430074, China;2.School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 715次   下载 1466
Received:October 02, 2022    Revised:November 04, 2022
中文摘要: 语义蕴含识别旨在检测和判断两个语句的语义是否一致, 以及是否存在蕴含关系. 然而现有方法通常面临中文同义词、一词多义现象困扰和长文本难理解的挑战. 针对上述问题, 本文提出了一种基于Transformer和HowNet义原知识融合的双驱动中文语义蕴含识别方法, 首先通过Transformer对中文语句内部结构语义信息进行多层次编码和数据驱动, 并引入外部知识库HowNet进行知识驱动建模词汇之间的义原知识关联, 然后利用soft-attention进行交互注意力计算并与义原矩阵实现知识融合, 最后用BiLSTM进一步编码文本概念层语义信息并推理判别语义一致性和蕴含关系. 本文所提出的方法通过引入HowNet义原知识手段解决多义词及同义词困扰, 通过Transformer策略解决长文本挑战问题. 在BQ、AFQMC、PAWSX等金融和多语义释义对数据集上的实验结果表明, 与DSSM、MwAN、DRCN等轻量化模型以及ERNIE等预训练模型相比, 该模型不仅可以有效提升中文语义蕴含识别的准确率(相比DSSM模型提升2.19%), 控制模型的参数量(16 M), 还能适应50字及以上的长文本蕴含识别场景.
Abstract:Semantic entailment recognition aims to detect and judge whether the semantics of two Chinese sentences are consistent and whether there is an entailment relationship. The existing methods, however, usually face the challenges of Chinese synonyms, polysemy, and difficulty in understanding long texts. To solve the above problems, this study proposes a co-driven Chinese semantic entailment recognition method based on the fusion of Transformer and sememe knowledge of HowNet. First, the internal structural semantic information of Chinese sentences is encoded at multiple levels and undergoes data-driven processing by Transformer. The external knowledge base HowNet is introduced for knowledge-driven modeling of the sememe knowledge correlations between words. Then, the interaction attention is calculated by Soft-Attention and achieves knowledge fusion with the sememe matrix. Finally, BiLSTM is used to encode the semantic information of the conceptual layer of texts and infer and judge the semantic consistency and entailment relationship. The proposed method employs the sememe knowledge of HowNet to solve the problems of polysemy and synonyms and uses the Transformer strategy to resolve the challenge of long texts. The experimental results on financial and multi-semantic interpretation pair data sets such as BQ, AFQMC, and PAWSX show that compared with lightweight models such as DSSM, MwAN, and DRCN and pre-trained models such as ERNIE, this model can effectively improve the recognition accuracy of Chinese semantic entailment (an increase of 2.19% compared with that of the DSSM model) and control the number of model parameters (16 M). In addition, it can also adapt to entailment recognition scenarios of long texts with no less than 50 words.
文章编号:     中图分类号:    文献标志码:
基金项目:国家重点研发计划(2021YFB2012202);湖北省科技重大专项(2020AEA011);湖北省重点研发计划(2020BAB100,2021BAA171,2021BAA038)
引用文本:
陈帆,黄炎,张新访.基于Transformer与HowNet义原知识融合的双驱动语义蕴含识别.计算机系统应用,2023,32(5):291-299
CHEN Fan,HUANG Yan,ZHANG Xin-Fang.Co-driven Recognition of Semantic Entailment Based on Fusion of Transformer and HowNet Sememe Knowledge.COMPUTER SYSTEMS APPLICATIONS,2023,32(5):291-299