数字人文环境下融入多特征的词命名实体识别
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

教育部哲学社会科学研究后期项目(21JHQ081)


Named Entity Recognition of Poetry by Integrating Multi-features in Digital Humanities
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    近年来, 数字人文受到广泛关注, 数字人文环境下的词命名实体识别研究日渐兴起, 但鲜有研究从字特征的特征表示能力、分词的准确性、领域知识的有效性等方面进行探究. 鉴于此, 针对汉字的象形文字特点和词文本的特殊性, 在字特征的基础上, 引入部首特征、格律特征和声韵特征, 提出特征增强单元和特征抽取单元, 并将词牌知识三元组通过ANALOGY得到的知识向量表示为词牌知识向量, 通过双向长短时记忆网络、注意力机制等模型将部首向量、字向量、格律向量、声韵向量、词牌知识向量进行深度融合, 最终构建出融入多特征的词命名实体识别方法. 在《花间集全译》自制语料上的对比实验和消融实验的结果表明, 本文所提方法能够有效利用多特征提升词命名实体识别性能. 其F1值达到了85.63%, 完成了词命名实体识别任务.

    Abstract:

    In recent years, research on the named entity recognition of poetry in digital humanities is emerging, but few studies have been conducted with regard to the feature expressiveness of character features, word segmentation accuracy, and the effectiveness of domain-specific knowledge in poetry texts. According to the characteristics of Chinese pictographs and the particularity of poetry texts, a recognition method of named poetry entities with a feature enhancement unit and a feature extraction unit is proposed, which integrates multiple features such as characters, radicals, sounds, and metrical rules. The method presents the knowledge vectors obtained from the knowledge triples of tune pattern titles through the ANALOGY model as the knowledge vectors of tune pattern titles. Then, the radical vector, character vector, metrical rule vector, sound vector, and knowledge vector of tune pattern titles are deeply fused through the bidirectional long short-term memory network and attention mechanism models. In this way, the recognition method of named poetry entities fusing multi-features is constructed. The results of comparative experiments and ablation experiments on the self-made corpus of Translation of Among Flowers (Hua Jian Ji) (《花间集全译》) show that the proposed method can effectively use multi-features to improve the recognition performance of named entities, and its F1 score reaches 85.63%, which means it completes the recognition task of named poetry entities.

    参考文献
    相似文献
    引证文献
引用本文

张朦,刘忠宝.数字人文环境下融入多特征的词命名实体识别.计算机系统应用,2023,32(3):300-308

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-08-17
  • 最后修改日期:2022-09-15
  • 录用日期:
  • 在线发布日期: 2022-12-02
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号