电子病历是诊疗过程中记录患者健康状况的档案, 文本中分布着大量的医学实体, 其中蕴含着丰富的医学信息. 目前医学领域的关系抽取模型主要是通过关系分类的方法识别两个给定医学实体之间的语义关系. 中文电子病历具有实体高密度分布的特点. 针对这个问题, 本文提出了一种基于条件提示与序列标注的关系三元组识别方法, 将关系三元组识别任务转换为序列标注任务. 关系三元组中的头实体和关系类型作为条件提示信息, 通过序列标注方法识别电子病历文本中与条件提示信息有关联的尾实体. 在中文电子病历数据集上的实验证明本文方法能有效识别中文电子病历中的关系三元组.
Electronic medical records are the archives to note patients’ health conditions during treatment, where a large number of medical entities are scattered throughout the text and a wealth of medical information is contained. Existing relation extraction models in the medical field mainly utilize the relation classification method to recognize the semantic relation between two medical entities. Chinese electronic medical records have the characteristic of a dense distribution of medical entities in the text. In response, this study proposes a method based on condition hint and sequence labeling to extract relation triples. In this approach, the relation triple recognition task is converted to a sequence labeling task. The head entity and relation type in a relation triple combine to form condition hint information, and the model recognizes tail entities relevant to the condition hint information from the text of electronic medical records by sequence labeling. The experimental results on an electronic medical records dataset show that this method can be applied to recognize relation triples in Chinese electronic medical records.