简要案情的命名实体识别技术
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

湖南省自然科学基金(2018JJ2107);湖南省科技重大专项(2017SK1040);湖南省公安厅科技计划(2018No.3)


Named Entity Recognition Technology for Brief Case
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 增强出版
  • |
  • 文章评论
    摘要:

    简要案情是公安机关为提高“协同办案系统”录入信息质量,确保信息检索与案件串并工作高效开展而对案情记载的简要描述,其中各类实体间包含了大量与受害者和作案人相关的案情信息.因此,对简要案情文本的深度挖掘是掌握案件始末和分析案情的有效手段之一.简要案情文本中的实体稠密分布、实体间相互嵌套以及实体简称,给准确捕捉案件实体带来了巨大的挑战.针对简要案情文本的特殊性和复杂性,本文对字符向量生成的方法进行了改进,提出了RC-BiLSTM-CRF (Roberta-CNN-BiLSTM-CRF)网络架构,相比于主流的“Bert-BiLSTM-CRF”架构,该架构可以对字符向量特征进行提取,解决了通过预训练模型带来的字符向量冗长的问题,通过减少模型的参数量进而提高了模型整体参数的收敛速度.对比实验选用5种主流的架构在湖南省省公安机关提供的简要案情数据集上进行比较,本文提出的方法在准确率、召回率和F1值上均为最优,F1值达到了88.02%.

    Abstract:

    A brief case is a brief description of a case record made by a public security organ to improve the quality of information input in the Collaborative Case Handling System and ensure efficient information retrieval and joint investigation. A large amount of case information related to the victim and the perpetrator is between various entities. Therefore, in-depth excavation of brief case texts is an effective means to grasp the beginning and end of a case and to analyze the case. The dense distribution, inter-nesting, and abbreviation of entities in a brief case text bring great challenges to the accurate capture of the case entities. In response to the particularity and complexity of brief case texts, this study improves the method of character vector generation and proposes a Roberta-CNN-BiLSTM-CRF (RC-BiLSTM-CRF) network architecture. Compared with the mainstream Bert-BiLSTM-CRF architecture, this architecture can extract the character vector features, thereby solving the problem of a lengthy character vector brought by model pre-training. The model parameter number is reduced for a higher overall parameter convergence rate. In the comparative experiment, five mainstream architectures are selected and compared on the brief case dataset provided by the public security organs of Hunan Province. The method proposed in this study is proved to be the best in terms of accuracy, recall rate, and F1 value, and its F1 value reaches 88.02%.

    参考文献
    相似文献
    引证文献
引用本文

陈柱辉,刘新,张明键,张达为.简要案情的命名实体识别技术.计算机系统应用,2022,31(1):47-54

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-03-24
  • 最后修改日期:2021-04-21
  • 录用日期:
  • 在线发布日期: 2021-12-17
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号