本文已被:浏览 225次 下载 536次
Received:January 18, 2024 Revised:February 26, 2024
Received:January 18, 2024 Revised:February 26, 2024
中文摘要: 古汉语文本承载着丰富的历史和文化信息, 对这类文本进行实体关系抽取研究并构建相关知识图谱对于文化传承具有重要作用. 针对古汉语文本中存在大量生僻汉字、语义模糊和复义等问题, 提出了一种基于BERT古文预训练模型的实体关系联合抽取模型 (entity relation joint extraction model based on BERT-ancient-Chinese pre-trained model, JEBAC). 首先, 通过融合BiLSTM神经网络和注意力机制的BERT古文预训练模型 (BERT-ancient-Chinese pre-trained model integrated BiLSTM neural network and attention mechanism, BACBA), 识别出句中所有的subject实体和object实体, 为关系和object实体联合抽取提供依据. 接下来, 将subject实体的归一化编码向量与整个句子的嵌入向量相加, 以更好地理解句中subject实体的语义特征; 最后, 结合带有subject实体特征的句子向量和object实体的提示信息, 通过BACBA实现句中关系和object实体的联合抽取, 从而得到句中所有的三元组信息(subject实体, 关系, object实体). 在中文实体关系抽取DuIE2.0数据集和CCKS 2021的文言文实体关系抽取C-CLUE小样本数据集上, 与现有的方法进行了性能比较. 实验结果表明, 该方法在抽取性能上更加有效, F1值分别可达79.2%和55.5%.
Abstract:Ancient Chinese texts are rich in historical and cultural information. Studying entity relationship extraction of such texts and constructing related knowledge graphs play an important role in cultural inheritance. Given the large number of rare Chinese characters, semantic fuzziness, and ambiguity in ancient Chinese texts, the entity relation joint extraction model based on the BERT-ancient-Chinese pre-trained model (JEBAC) is proposed. First of all, BERT-ancient-Chinese pre-trained model integrates the BiLSTM neural network and attention mechanism (BACBA), identifies all subject and object entities in sentences, and provides a basis for joint extraction of relation and object entities. Next, the normalized coding vector of the subject entity is added to the embedding vector of the whole sentence to better understand the semantic features of the subject entity in the sentence. Finally, combined with the sentence vector with the characteristics of the subject entity and the prompt information of the object entity, the relationship and object entity in the sentence are jointly extracted by BACBA to obtain all triple information (subject entity, relationship, and object entity) in the sentence. The performance of Chinese entity relation extraction DuIE2.0 datasets and the classical Chinese entity relation extraction C-CLUE small sample datasets of CCKS 2021 are compared with that of the existing methods. Experimental results show that the proposed method is more effective in extraction performance, with F1 values up to 79.2% and 55.5%, respectively.
keywords: ancient Chinese text entity relation extraction BERT-ancient-Chinese pre-trained model BiLSTM attention triple information
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(51878536); 陕西省住房城乡建设科技计划基金(2020-K09); 陕西省教育厅协同创新中心基金(23JY038)
引用文本:
李智杰,杨盛杰,李昌华,张颉,董玮,介军.基于BERT古文预训练模型的实体关系联合抽取.计算机系统应用,2024,33(8):187-195
LI Zhi-Jie,YANG Sheng-Jie,LI Chang-Hua,ZHANG Jie,DONG Wei,JIE Jun.Joint Entity Relation Extraction Based on BERT-ancient-Chinese Pre-trained Model.COMPUTER SYSTEMS APPLICATIONS,2024,33(8):187-195
李智杰,杨盛杰,李昌华,张颉,董玮,介军.基于BERT古文预训练模型的实体关系联合抽取.计算机系统应用,2024,33(8):187-195
LI Zhi-Jie,YANG Sheng-Jie,LI Chang-Hua,ZHANG Jie,DONG Wei,JIE Jun.Joint Entity Relation Extraction Based on BERT-ancient-Chinese Pre-trained Model.COMPUTER SYSTEMS APPLICATIONS,2024,33(8):187-195