结合RoBERTa与多策略召回的医学术语标准化
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家社科基金(21BTQ106)


Combining RoBERTa with Multi-strategy Recall for Medical Terminology Normalization
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对传统的基于模板匹配、人工构建特征、语义匹配等解决术语标准化的方案, 往往会存在术语映射准确率不高, 难以对齐等问题. 本文结合医疗领域的文本中术语口语化、表达多样化的特点, 使用了多策略召回和蕴含语义评分排序模块来提升医学术语标准化效果. 在多策略召回模块中使用了基于Jaccard相关系数、TF-IDF、历史召回方法进行召回, 在蕴含语义评分模块使用了RoBERTa-wwm-ext作为判分语义模型. 首次在医学专业人员标注的基于SNOMED CT标准的中文数据集上验证了可用性. 实验证明, 在医疗知识特征的处理中, 本方法能够在医学术语标准化实际应用上达到不错的效果, 具有很好的泛化性及实用价值.

    Abstract:

    Traditional terminology standardization schemes based on template matching, artificially constructed features, semantic matching, etc., are often faced with problems such as low terminology mapping accuracy and difficult alignment. Given the colloquial and diverse expression of terminology in medical texts, modules of multi-strategy recall and implication semantic score ranking are used to improve the effect of medical terminology standardization. In the multi-strategy recall module, the recall method based on the Jaccard correlation coefficient, term frequency-inverse document frequency (TF-IDF), and historical recalls is employed. In the implication semantic scoring module, RoBERTa-wwm-ext is adopted as the scoring semantic model. The usability of the proposed method is validated for the first time on a Chinese dataset that is based on the systematized nomenclature of medicine-clinical terms (SNOMED CT) standard and annotated by medical professionals. Experiments show that in the processing of medical knowledge features, the proposed method can achieve favorable results in practical applications of medical terminology standardization and has high generalization and practical value.

    参考文献
    相似文献
    引证文献
引用本文

韩振桥,付立军,刘俊明,郭宇捷,唐珂轲,梁锐.结合RoBERTa与多策略召回的医学术语标准化.计算机系统应用,2022,31(10):245-253

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-01-14
  • 最后修改日期:2022-02-15
  • 录用日期:
  • 在线发布日期: 2022-07-07
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号