本文已被:浏览 451次 下载 975次
Received:March 24, 2023 Revised:April 28, 2023
Received:March 24, 2023 Revised:April 28, 2023
中文摘要: 当前的英文语法纠错模型往往忽略了有利于语法纠错的文本句法知识, 从而使得英语语法纠错模型的纠错能力受到影响. 针对上述问题, 提出一种基于差分融合句法特征的英语语法纠错模型. 首先, 本文提出的句法编码器不仅可以直接从文本中无监督地生成依存关系图和成分句法树信息, 而且还能将上述两种异构的句法结构进行特征融合, 编码成高维的句法表征. 其次, 为了同时利用文本中的语义和句法信息, 差分融合模块先使用差分正则化加强语义编码器捕获句法编码器未能生成的语义特征, 然后采用协同注意力将句法表征和语义表征进一步融合, 作为Transformer编码端的输出特征, 最终输入到解码端, 从而生成语法正确的文本. 在CoNLL-2014 英文纠错任务数据集上进行对比实验, 结果表明, 该方法的准确率和F0.5值优于基于Copy-Augmented Transformer的语法纠错模型, 其F0.5值提升了5.2个百分点, 并且句法知识避免了标注数据过少问题, 具有更优的文本纠错效果.
中文关键词: 自然语言处理|语法纠错|句法知识|协同注意力|差分融合
Abstract:Current English GEC methods tend to ignore the syntactic knowledge in texts, which plays an important role in grammatical error correction, and thus the error correction ability of English GEC models is affected. To address this problem, the study proposes a GEC method which is based on the differential fusion syntactic features. First, the proposed syntactic encoder can generate dependency graph and constituency syntactic tree information from raw data in an unsupervised way and conduct the feature fusion of these two heterogeneous syntactic structures to encode high-dimensional syntactic representation. Second, to utilize both semantic and syntactic information in the text, the differential fusion module first uses differential regularization to enhance the semantic encoder to capture the semantic features that the syntactic encoder fails to generate. Then the syntactic representation and semantic representation are further fused by cross attention as the output features of the Transformer encoder, which are finally input to the decoder to generate grammatically correct text. The comparison experiment on the CoNLL-2014 task dataset shows that the precision and F0.5 value of this method are better than those of the GEC model based on the Copy-Augmented Transformer, and the F0.5 value of this method is improved by 5.2 percentage points. The syntactic knowledge avoids the problem of lacking high-quality annotated training corpora and has a better performance in text error correction.
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(62272308)
引用文本:
罗松,汪春梅,袁非牛,戴维.基于差分融合句法特征的英语语法纠错模型.计算机系统应用,2023,32(10):293-300
LUO Song,WANG Chun-Mei,YUAN Fei-Niu,DAI Wei.Grammatical Error Correction Model Based on Differential Fusion Syntactic Feature.COMPUTER SYSTEMS APPLICATIONS,2023,32(10):293-300
罗松,汪春梅,袁非牛,戴维.基于差分融合句法特征的英语语法纠错模型.计算机系统应用,2023,32(10):293-300
LUO Song,WANG Chun-Mei,YUAN Fei-Niu,DAI Wei.Grammatical Error Correction Model Based on Differential Fusion Syntactic Feature.COMPUTER SYSTEMS APPLICATIONS,2023,32(10):293-300