Two-stage Medical Terminology Standardization Based on RoBERTa and T5
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Medical terminology standardization, as an important means to eliminate entity ambiguity, is widely used in the process of building knowledge graphs. Aiming at the problem that the medical field involves a large number of professional terminology and complex expressions, and the traditional matching models are often difficult to achieve a high accuracy rate, a two-stage model of semantic recall and precise sorting is proposed to improve the standardization effect of medical terminology. First, in the semantic recall stage, a semantic representation model CL-BERT is proposed based on the improved supervised contrastive learning and RoBERTa-wwm. The semantic representation vector of an entity is generated through CL-BERT, and recall is carried out according to the cosine similarity between the vectors, so as to obtain the standard word candidate set. Secondly, in the precise sorting stage, T5, combined with prompt tuning, is used to build a precise semantic matching model, and FGM confrontation training is applied to the model training; next, the precise matching model is used to precisely sort the original word and standard word candidate sets, so as to obtain the final standard words. The ccks2019 public data set is used for experiments, achieving an F1 value of 0.920 6. The experimental results show that the proposed two-stage model showcases high performance, and provides a new idea for medical terminology standardization.

    Reference
    Related
    Cited by
Get Citation

周景,崔灿灿,王梦迪,王泽敏.基于RoBERTa和T5的两阶段医学术语标准化.计算机系统应用,2024,33(1):280-288

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:May 18,2023
  • Revised:June 26,2023
  • Adopted:
  • Online: November 24,2023
  • Published: January 05,2023
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063