基于差分融合句法特征的英语语法纠错模型
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(62272308)


Grammatical Error Correction Model Based on Differential Fusion Syntactic Feature
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    当前的英文语法纠错模型往往忽略了有利于语法纠错的文本句法知识, 从而使得英语语法纠错模型的纠错能力受到影响. 针对上述问题, 提出一种基于差分融合句法特征的英语语法纠错模型. 首先, 本文提出的句法编码器不仅可以直接从文本中无监督地生成依存关系图和成分句法树信息, 而且还能将上述两种异构的句法结构进行特征融合, 编码成高维的句法表征. 其次, 为了同时利用文本中的语义和句法信息, 差分融合模块先使用差分正则化加强语义编码器捕获句法编码器未能生成的语义特征, 然后采用协同注意力将句法表征和语义表征进一步融合, 作为Transformer编码端的输出特征, 最终输入到解码端, 从而生成语法正确的文本. 在CoNLL-2014 英文纠错任务数据集上进行对比实验, 结果表明, 该方法的准确率和F0.5值优于基于Copy-Augmented Transformer的语法纠错模型, 其F0.5值提升了5.2个百分点, 并且句法知识避免了标注数据过少问题, 具有更优的文本纠错效果.

    Abstract:

    Current English GEC methods tend to ignore the syntactic knowledge in texts, which plays an important role in grammatical error correction, and thus the error correction ability of English GEC models is affected. To address this problem, the study proposes a GEC method which is based on the differential fusion syntactic features. First, the proposed syntactic encoder can generate dependency graph and constituency syntactic tree information from raw data in an unsupervised way and conduct the feature fusion of these two heterogeneous syntactic structures to encode high-dimensional syntactic representation. Second, to utilize both semantic and syntactic information in the text, the differential fusion module first uses differential regularization to enhance the semantic encoder to capture the semantic features that the syntactic encoder fails to generate. Then the syntactic representation and semantic representation are further fused by cross attention as the output features of the Transformer encoder, which are finally input to the decoder to generate grammatically correct text. The comparison experiment on the CoNLL-2014 task dataset shows that the precision and F0.5 value of this method are better than those of the GEC model based on the Copy-Augmented Transformer, and the F0.5 value of this method is improved by 5.2 percentage points. The syntactic knowledge avoids the problem of lacking high-quality annotated training corpora and has a better performance in text error correction.

    参考文献
    相似文献
    引证文献
引用本文

罗松,汪春梅,袁非牛,戴维.基于差分融合句法特征的英语语法纠错模型.计算机系统应用,2023,32(10):293-300

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-03-24
  • 最后修改日期:2023-04-28
  • 录用日期:
  • 在线发布日期: 2023-07-14
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号