Abstract:Medical terminology standardization, as an important means to eliminate entity ambiguity, is widely used in the process of building knowledge graphs. Aiming at the problem that the medical field involves a large number of professional terminology and complex expressions, and the traditional matching models are often difficult to achieve a high accuracy rate, a two-stage model of semantic recall and precise sorting is proposed to improve the standardization effect of medical terminology. First, in the semantic recall stage, a semantic representation model CL-BERT is proposed based on the improved supervised contrastive learning and RoBERTa-wwm. The semantic representation vector of an entity is generated through CL-BERT, and recall is carried out according to the cosine similarity between the vectors, so as to obtain the standard word candidate set. Secondly, in the precise sorting stage, T5, combined with prompt tuning, is used to build a precise semantic matching model, and FGM confrontation training is applied to the model training; next, the precise matching model is used to precisely sort the original word and standard word candidate sets, so as to obtain the final standard words. The ccks2019 public data set is used for experiments, achieving an F1 value of 0.920 6. The experimental results show that the proposed two-stage model showcases high performance, and provides a new idea for medical terminology standardization.