本文已被:浏览 3095次 下载 3134次
Received:July 31, 2018 Revised:August 30, 2018
Received:July 31, 2018 Revised:August 30, 2018
中文摘要: 本文主要是对在线问诊中产生的医疗文本进行命名实体识别的研究.使用在线医疗问答网站的数据,采用{B,I,O}标注体系构建数据集,抽取疾病、治疗、检查和症状四个医疗实体.以BiLSTM-CRF为基准模型,提出两种深度学习模型IndRNN-CRF和IDCNN-BiLSTM-CRF,并在自构建数据集上验证模型的有效性.将新提出的两种模型与基准模型通过实验对比得出:模型IDCNN-BiLSTM-CRF的F1值0.8116,超过了BiLSTM-CRF的F1值0.8009,IDCNN-BiLSTM-CRF整体性能好于BiLSTM-CRF模型;模型IndRNN-CRF的精确率0.8427,但该模型在召回率上低于基准模型BiLSTM-CRF.
Abstract:This paper mainly presents the research of named entity recognition of medical texts generated by online inquiry. Using the data of online medical quiz website, we employ {B, I, O} annotation system to build data sets, and extract four medical entities of disease, treatment, examination, and symptom. Taking BiLSTM-CRF as the benchmark model, two deep learning models IndRNN-CRF and IDCNN-BiLSTM-CRF are proposed, and the validity of the model on the self built dataset is verified. The two new models are compared with the benchmark model by experiment. It is concluded that the model IDCNN-BiLSTM-CRF has an F1 value of 0.8165, which exceeds the BiLSTM-CRF's F1 value of 0.8009. The overall performance of IDCNN-BiLSTM-CRF is better than that of BiLSTM-CRF. The IndRNN-CRF model has a high precision rate of 0.8427, but its recall rate is lower than the benchmark model BiLSTM-CRF.
keywords: medical question and answer deep learning Independent Recurrent Neural Network (IndRNN) dilation convolution bi-directional RNN
文章编号: 中图分类号: 文献标志码:
基金项目:
引用文本:
杨文明,褚伟杰.在线医疗问答文本的命名实体识别.计算机系统应用,2019,28(2):8-14
YANG Wen-Ming,CHU Wei-Jie.Named Entity Recognition of Online Medical Question Answering Text.COMPUTER SYSTEMS APPLICATIONS,2019,28(2):8-14
杨文明,褚伟杰.在线医疗问答文本的命名实体识别.计算机系统应用,2019,28(2):8-14
YANG Wen-Ming,CHU Wei-Jie.Named Entity Recognition of Online Medical Question Answering Text.COMPUTER SYSTEMS APPLICATIONS,2019,28(2):8-14