预训练语言模型虽然能够为每个词提供优良的上下文表示特征, 但却无法显式地给出词法和句法特征, 而这些特征往往是理解整体语义的基础. 鉴于此, 本文通过显式地引入词法和句法特征, 探究其对于预训练模型阅读理解能力的影响. 首先, 本文选用了词性标注和命名实体识别来提供词法特征, 使用依存分析来提供句法特征, 将二者与预训练模型输出的上下文表示相融合. 随后, 我们设计了基于注意力机制的自适应特征融合方法来融合不同类型特征. 在抽取式机器阅读理解数据集CMRC2018上的实验表明, 本文方法以极低的算力成本, 利用显式引入的词法和句法等语言特征帮助模型在F1和EM指标上分别取得0.37%和1.56%的提升.
Language models obtained by pre-training unstructured text alone can provide excellent contextual representation features for each word, but cannot explicitly provide lexical and syntactic features, which are often the basis for understanding overall semantics. In this study, we investigate the impact of lexical and syntactic features on the reading comprehension ability of pre-trained models by introducing them explicitly. First, we utilize part of speech tagging and named entity recognition to provide lexical features and dependency parsing to provide syntactic features. These features are integrated with the contextual representation from the pre-trained model output. Then, we design an adaptive feature fusion method based on the attention mechanism to fuse different types of features. Experiments on the extractive machine reading comprehension dataset CMRC2018 show that our approach helps the model achieve 0.37% and 1.56% improvement in F1 and EM scores, respectively, by using explicitly introduced lexical and syntactic features at a very low computational cost.