本文已被:浏览 645次 下载 1215次
Received:September 07, 2021 Revised:October 11, 2021
Received:September 07, 2021 Revised:October 11, 2021
中文摘要: 在电力生产的过程中, 往往会产生大量电力相关的文本数据, 但这些数据大多是非结构化数据且体量庞大繁杂, 实现对电力相关数据有效的组织管理可以促进电力企业实现数字资产商品化, 以此为电力企业发掘新的利润增长点. 本文针对将电力行业中的相关规章制度文本进行结构化处理这一问题, 提出了基于字符和二元词组特征的命名实体识别的模型. 在该模型中, 通过使用融合多特征的BERT预训练语言模型得到词嵌入表示, 并使用引入相对位置编码的Transformer模型和条件随机场作为编码层和解码层, 本文提出的模型在实体类型识别的准确率为92.64%, 取得了有效的识别效果.
中文关键词: 命名实体识别 BERT模型 Transformer模型 条件随机场
Abstract:In the process of power production, a large amount of power-related text data is often generated, and most of these data are unstructured and large in size. Thus, achieving effective organization and management of these data can promote power companies to produce digital asset products, which can help discover new profit growth points for power companies. Aiming at structuring the text of relevant regulations in the electric power industry, this study proposes a named entity recognition model based on the features of characters and binary phrases. In this model, the word embedding representation is obtained by using the BERT pre-trained language model fused with multiple features, and the Transformer model and conditional random field that introduce the relative position coding are used as the encoding layer and the decoding layer, respectively. The model proposed in this study is applied in entity type recognition, and it can achieve effective recognition with the accuracy of as high as 92.64%.
文章编号: 中图分类号: 文献标志码:
基金项目:国家重点研发计划(2021YFE0102400)
引用文本:
陈鹏,蔡冰,何晓勇,金兆轩,金志刚,侯瑞.面向电力规章制度的命名实体识别.计算机系统应用,2022,31(6):210-216
CHEN Peng,CAI Bing,HE Xiao-Yong,JIN Zhao-Xuan,JIN Zhi-Gang,HOU Rui.Named Entity Identification for Electric Power Regulations.COMPUTER SYSTEMS APPLICATIONS,2022,31(6):210-216
陈鹏,蔡冰,何晓勇,金兆轩,金志刚,侯瑞.面向电力规章制度的命名实体识别.计算机系统应用,2022,31(6):210-216
CHEN Peng,CAI Bing,HE Xiao-Yong,JIN Zhao-Xuan,JIN Zhi-Gang,HOU Rui.Named Entity Identification for Electric Power Regulations.COMPUTER SYSTEMS APPLICATIONS,2022,31(6):210-216