基于BERT古文预训练模型的实体关系联合抽取

doi:10.15888/j.cnki.csa.009591

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年8月11日 3:39 星期一

首页 > 过刊浏览>2024年第33卷第8期 >187-195. DOI:10.15888/j.cnki.csa.009591

PDF HTML阅读 XML下载导出引用引用提醒

基于BERT古文预训练模型的实体关系联合抽取
DOI:
                        10.15888/j.cnki.csa.009591
                    
CSTR:
                        32024.14.csa.009591
                    
作者:
                        李智杰李智杰
西安建筑科技大学 信息与控制工程学院, 西安 710055
在期刊界中查找
在百度中查找
在本站中查找
杨盛杰杨盛杰
西安建筑科技大学 信息与控制工程学院, 西安 710055
在期刊界中查找
在百度中查找
在本站中查找
李昌华李昌华
西安建筑科技大学 信息与控制工程学院, 西安 710055
在期刊界中查找
在百度中查找
在本站中查找
张颉张颉
西安建筑科技大学 信息与控制工程学院, 西安 710055
在期刊界中查找
在百度中查找
在本站中查找
董玮董玮
西安建筑科技大学 信息与控制工程学院, 西安 710055
在期刊界中查找
在百度中查找
在本站中查找
介军介军
西安建筑科技大学 信息与控制工程学院, 西安 710055
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(51878536); 陕西省住房城乡建设科技计划基金(2020-K09); 陕西省教育厅协同创新中心基金(23JY038)

Joint Entity Relation Extraction Based on BERT-ancient-Chinese Pre-trained Model

Author:

LI Zhi-Jie
LI Zhi-Jie
School of Information and Control Engineering, Xi’an University of Architectural Science and Technology, Xi’an 710055, China
在期刊界中查找
在百度中查找
在本站中查找
YANG Sheng-Jie
YANG Sheng-Jie
School of Information and Control Engineering, Xi’an University of Architectural Science and Technology, Xi’an 710055, China
在期刊界中查找
在百度中查找
在本站中查找
LI Chang-Hua
LI Chang-Hua
School of Information and Control Engineering, Xi’an University of Architectural Science and Technology, Xi’an 710055, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Jie
ZHANG Jie
School of Information and Control Engineering, Xi’an University of Architectural Science and Technology, Xi’an 710055, China
在期刊界中查找
在百度中查找
在本站中查找
DONG Wei
DONG Wei
School of Information and Control Engineering, Xi’an University of Architectural Science and Technology, Xi’an 710055, China
在期刊界中查找
在百度中查找
在本站中查找
JIE Jun
JIE Jun
School of Information and Control Engineering, Xi’an University of Architectural Science and Technology, Xi’an 710055, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [20]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

古汉语文本承载着丰富的历史和文化信息, 对这类文本进行实体关系抽取研究并构建相关知识图谱对于文化传承具有重要作用. 针对古汉语文本中存在大量生僻汉字、语义模糊和复义等问题, 提出了一种基于BERT古文预训练模型的实体关系联合抽取模型 (entity relation joint extraction model based on BERT-ancient-Chinese pre-trained model, JEBAC). 首先, 通过融合BiLSTM神经网络和注意力机制的BERT古文预训练模型 (BERT-ancient-Chinese pre-trained model integrated BiLSTM neural network and attention mechanism, BACBA), 识别出句中所有的subject实体和object实体, 为关系和object实体联合抽取提供依据. 接下来, 将subject实体的归一化编码向量与整个句子的嵌入向量相加, 以更好地理解句中subject实体的语义特征; 最后, 结合带有subject实体特征的句子向量和object实体的提示信息, 通过BACBA实现句中关系和object实体的联合抽取, 从而得到句中所有的三元组信息(subject实体, 关系, object实体). 在中文实体关系抽取DuIE2.0数据集和CCKS 2021的文言文实体关系抽取C-CLUE小样本数据集上, 与现有的方法进行了性能比较. 实验结果表明, 该方法在抽取性能上更加有效, F1值分别可达79.2%和55.5%.

关键词:古汉语文本;实体关系抽取;BERT古文预训练模型;BiLSTM;注意力;三元组信息

Abstract:

Ancient Chinese texts are rich in historical and cultural information. Studying entity relationship extraction of such texts and constructing related knowledge graphs play an important role in cultural inheritance. Given the large number of rare Chinese characters, semantic fuzziness, and ambiguity in ancient Chinese texts, the entity relation joint extraction model based on the BERT-ancient-Chinese pre-trained model (JEBAC) is proposed. First of all, BERT-ancient-Chinese pre-trained model integrates the BiLSTM neural network and attention mechanism (BACBA), identifies all subject and object entities in sentences, and provides a basis for joint extraction of relation and object entities. Next, the normalized coding vector of the subject entity is added to the embedding vector of the whole sentence to better understand the semantic features of the subject entity in the sentence. Finally, combined with the sentence vector with the characteristics of the subject entity and the prompt information of the object entity, the relationship and object entity in the sentence are jointly extracted by BACBA to obtain all triple information (subject entity, relationship, and object entity) in the sentence. The performance of Chinese entity relation extraction DuIE2.0 datasets and the classical Chinese entity relation extraction C-CLUE small sample datasets of CCKS 2021 are compared with that of the existing methods. Experimental results show that the proposed method is more effective in extraction performance, with F1 values up to 79.2% and 55.5%, respectively.

Key words:ancient Chinese text;entity relation extraction;BERT-ancient-Chinese pre-trained model;BiLSTM;attention;triple information

参考文献

[1] 朱晓, 金力. 条件随机场图模型在《明史》词性标注研究中的应用效果探索. 复旦学报(自然科学版), 2014, 53(3): 297–304.

[2] 王晓玉, 李斌. 基于CRFs和词典信息的中古汉语自动分词. 数据分析与知识发现, 2017, 1(5): 62–70.

[3] 程宁, 李斌, 葛四嘉, 等. 基于BiLSTM-CRF的古汉语自动断句与词法分析一体化研究. 中文信息学报, 2020, 34(4): 1–9.

[4] 俞敬松, 魏一, 张永伟, 等. 基于非参数贝叶斯模型和深度学习的古文分词研究. 中文信息学报, 2020, 34(6): 1–8.

[5] Wang PY, Ren ZC. The uncertainty-based retrieval framework for ancient Chinese CWS and POS. Proceedings of the 2nd Workshop on Language Technologies for Historical and Ancient Languages. Marseille: European Language Resources Association, 2022. 164–168.

[6] 韩立帆, 季紫荆, 陈子睿, 等. 数字人文视域下面向历史古籍的信息抽取方法研究. 大数据, 2022, 8(6): 26–39.

[7] 张仰森, 刘帅康, 刘洋, 等. 基于深度学习的实体关系联合抽取研究综述. 电子学报, 2023, 51(4): 1093–1116.

[8] Zeng DJ, Liu K, Lai SW, et al. Relation classification via convolutional deep neural network. Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers. Dublin: ACL, 2014. 2335–2344.

[9] Xu Y, Mou LL, Li G, et al. Classifying relations via long short term memory networks along shortest dependency paths. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon: ACL, 2015. 1785–1794.

[10] Nayak T, Ng HT. Effective attention modeling for neural relation extraction. Proceedings of the 23rd Conference on Computational Natural Language Learning. Hong Kong: ACL, 2019. 603–612.

[11] Gupta P, Schütze H, Andrassy B. Table filling multi-task recurrent neural network for joint entity and relation extraction. Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers. Osaka: The COLING 2016 Organizing Committee, 2016. 2537–2547.

[12] Katiyar A, Cardie C. Going out on a limb: Joint extraction of entity mentions and relations without dependency trees. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers). Vancouver: ACL, 2017. 917–928.

[13] Zheng SC, Wang F, Bao HY, et al. Joint extraction of entities and relations based on a novel tagging scheme. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver: ACL, 2017. 1227–1236.

[14] Fu TJ, Li PH, Ma WY. GraphRel: Modeling text as relational graphs for joint entity and relation extraction. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019. 1409–1418.

[15] Wei ZP, Su JL, Wang Y, et al. A novel cascade binary tagging framework for relational triple extraction. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 2020. 1476–1488.

[16] Yu BW, Zhang ZY, Shu XB, et al. Joint extraction of entities and relations based on a novel decomposition strategy. Proceedings of the 24th European Conference on Artificial Intelligence: ECAI 2020. Santiago de Compostela: IOS Press, 2020. 2282–2289.

[17] Li SJ, He W, Shi YB, et al. DuIE: A large-scale Chinese dataset for information extraction. Proceedings of the 8th CCF International Conference on Natural Language Processing and Chinese Computing. Dunhuang: Springer, 2019. 791–800.

[18] Bekoulis G, Deleu J, Demeester T, et al. Joint entity recognition and relation extraction as a multi-head selection problem. Expert Systems with Applications, 2018, 114: 34–45.

[19] Zeng DJ, Zhang HR, Liu QY. CopyMTL: Copy mechanism for joint extraction of entities and relations with multi-task learning. Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York: AAAI, 2020. 9507–9514.

[20] Nayak T, Ng HT. Effective modeling of encoder-decoder architecture for joint entity and relation extraction. Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York: AAAI, 2020. 8528–8535.

引用本文

李智杰,杨盛杰,李昌华,张颉,董玮,介军.基于BERT古文预训练模型的实体关系联合抽取.计算机系统应用,2024,33(8):187-195

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-01-18
最后修改日期:2024-02-26
录用日期:
在线发布日期: 2024-07-03
出版日期:

微信公众号

网站二维码

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码