融合对抗训练及全局指针的实体关系联合抽取
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(U21A2013); 智能地学信息处理湖北省重点实验室开放基金(KLIGIP-2018B14)


Joint Entity and Relation Extraction by Integrating Adversarial Training and Global Pointers
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    实体关系联合抽取旨在从文本中抽取出实体关系三元组, 是构建知识图谱十分重要的步骤之一. 针对实体关系抽取中存在的信息表达能力不强、泛化能力较差、实体重叠和关系冗余等问题, 提出了一种实体关系联合抽取模型RGPNRE. 使用RoBERTa预训练模型作为编码器, 提高了模型的表达信息能力. 在训练过程中引入了对抗训练, 提升了模型的泛化能力. 使用全局指针, 解决了实体重叠的问题. 使用关系预测, 排除不可能的关系, 减少了冗余的关系. 在基于schema的中文医学信息抽取数据集CMeIE上进行的实体关系抽取实验表明, 模型的F1值比基准模型提升了约2个百分点, 在实体对重叠的情况下, 模型的F1值提升了近10个百分点, 在单一实体重叠情况下, 模型的F1值提升了大约1个百分点, 说明该模型能够更准确地提取实体关系三元组, 从而有效提升知识图谱构建的准确度. 在含有1–5个三元组的对比实验中, 在拥有4个三元组的句子中, 模型的F1值提升了约2个百分点, 而在拥有5个及以上三元组的复杂句子中, F1值提升了约1个百分点, 说明该模型能够较好地处理复杂句子场景.

    Abstract:

    Joint entity and relation extraction aims to extract entity relation triples from text and is one of the most important steps in building a knowledge graph. There are issues in joint entity and relation extraction, such as weak information expression, poor generalization ability, entity overlap, and relation redundancy. To address these issues, a joint entity and relation extraction model named RGPNRE is proposed. RoBERTa pre-trained model is used as an encoder to enhance the model’s information expression capability. Adversarial training is introduced in the training process to improve the model’s generalization ability. The use of the global pointer addresses entity overlap issues. Relation prediction is used to exclude impossible relations, reducing redundant relations. Entity and relation extraction experiments on the schema-based Chinese medical information extraction dataset CMeIE show that the final model achieved a 2% improvement in F1 score compared to the baseline model. In cases of entity pair overlap, there is a 10% increase in the F1 score, and in situations of single entity overlap, there is a 1% increase in the F1 score. This indicates that the model can more accurately extract entity relation triples, thereby assisting in knowledge graph construction. In the contrast experiment with 1–5 triples, the F1 score of the model increased by about 2 percentage points in sentences with 4 triples, and by about 1 percentage point in complex sentences with 5 or more triples, indicating that the model can effectively handle complex sentence scenarios.

    参考文献
    相似文献
    引证文献
引用本文

李文炽,刘远兴,蔡泽宇,吴湘宁,胡远江,杨翼.融合对抗训练及全局指针的实体关系联合抽取.计算机系统应用,2024,33(6):91-98

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-12-08
  • 最后修改日期:2024-02-07
  • 录用日期:
  • 在线发布日期: 2024-05-07
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号