本文已被:浏览 272次 下载 671次
Received:December 08, 2023 Revised:February 07, 2024
Received:December 08, 2023 Revised:February 07, 2024
中文摘要: 实体关系联合抽取旨在从文本中抽取出实体关系三元组, 是构建知识图谱十分重要的步骤之一. 针对实体关系抽取中存在的信息表达能力不强、泛化能力较差、实体重叠和关系冗余等问题, 提出了一种实体关系联合抽取模型RGPNRE. 使用RoBERTa预训练模型作为编码器, 提高了模型的表达信息能力. 在训练过程中引入了对抗训练, 提升了模型的泛化能力. 使用全局指针, 解决了实体重叠的问题. 使用关系预测, 排除不可能的关系, 减少了冗余的关系. 在基于schema的中文医学信息抽取数据集CMeIE上进行的实体关系抽取实验表明, 模型的F1值比基准模型提升了约2个百分点, 在实体对重叠的情况下, 模型的F1值提升了近10个百分点, 在单一实体重叠情况下, 模型的F1值提升了大约1个百分点, 说明该模型能够更准确地提取实体关系三元组, 从而有效提升知识图谱构建的准确度. 在含有1–5个三元组的对比实验中, 在拥有4个三元组的句子中, 模型的F1值提升了约2个百分点, 而在拥有5个及以上三元组的复杂句子中, F1值提升了约1个百分点, 说明该模型能够较好地处理复杂句子场景.
Abstract:Joint entity and relation extraction aims to extract entity relation triples from text and is one of the most important steps in building a knowledge graph. There are issues in joint entity and relation extraction, such as weak information expression, poor generalization ability, entity overlap, and relation redundancy. To address these issues, a joint entity and relation extraction model named RGPNRE is proposed. RoBERTa pre-trained model is used as an encoder to enhance the model’s information expression capability. Adversarial training is introduced in the training process to improve the model’s generalization ability. The use of the global pointer addresses entity overlap issues. Relation prediction is used to exclude impossible relations, reducing redundant relations. Entity and relation extraction experiments on the schema-based Chinese medical information extraction dataset CMeIE show that the final model achieved a 2% improvement in F1 score compared to the baseline model. In cases of entity pair overlap, there is a 10% increase in the F1 score, and in situations of single entity overlap, there is a 1% increase in the F1 score. This indicates that the model can more accurately extract entity relation triples, thereby assisting in knowledge graph construction. In the contrast experiment with 1–5 triples, the F1 score of the model increased by about 2 percentage points in sentences with 4 triples, and by about 1 percentage point in complex sentences with 5 or more triples, indicating that the model can effectively handle complex sentence scenarios.
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(U21A2013); 智能地学信息处理湖北省重点实验室开放基金(KLIGIP-2018B14)
引用文本:
李文炽,刘远兴,蔡泽宇,吴湘宁,胡远江,杨翼.融合对抗训练及全局指针的实体关系联合抽取.计算机系统应用,2024,33(6):91-98
LI Wen-Chi,LIU Yuan-Xing,CAI Ze-Yu,WU Xiang-Ning,HU Yuan-Jiang,YANG Yi.Joint Entity and Relation Extraction by Integrating Adversarial Training and Global Pointers.COMPUTER SYSTEMS APPLICATIONS,2024,33(6):91-98
李文炽,刘远兴,蔡泽宇,吴湘宁,胡远江,杨翼.融合对抗训练及全局指针的实体关系联合抽取.计算机系统应用,2024,33(6):91-98
LI Wen-Chi,LIU Yuan-Xing,CAI Ze-Yu,WU Xiang-Ning,HU Yuan-Jiang,YANG Yi.Joint Entity and Relation Extraction by Integrating Adversarial Training and Global Pointers.COMPUTER SYSTEMS APPLICATIONS,2024,33(6):91-98