###
计算机系统应用:2018,27(8):153-158
本文二维码信息
码上扫一扫!
基于改进词向量的石油文档语义关系识别
宫法明, 朱朋海
(中国石油大学(华东) 计算机与通信工程学院, 青岛 266580)
Semantic Relationship Recognition of Oil Documents Based on Improved Word Vector
GONG Fa-Ming, ZHU Peng-Hai
(College of Computer & Communication Engineering, China University of Petroleum, Qingdao 266580, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 136次   下载 116
投稿时间:2017-12-10    修订日期:2018-01-04
中文摘要: 语义关系识别是对文档进行处理识别出包含的语义关系的过程,是构建本体重要组成部分之一.在石油领域本体的构建过程中,由于石油领域的文档具有组合词多的特点,语义关系识别更加困难.目前使用的语义识别算法主要是基于关联规则的识别算法,但此类算法没有领域针对性.通过分析石油文档的特点,提出一种基于改进词向量的石油文档语义关系识别算法,以连续词袋(Continuous Bag-Of-Words,CBOW)模型为基础,对石油专业术语进行扩展训练,引入负采样和二次采样技术提高训练准确率和效率,利用向量特征训练支持向量机(Support Vector Mechine,SVM)分类器进行语义关系识别.实验结果表明,该方法训练的词向量能够准确识别石油领域的语义关系,在石油领域具有明显的优势.
中文关键词: 词向量  语义关系识别  SVM
Abstract:Semantic relationship recognition is the process of document processing and is used to identify the semantic relations contained in the process, which is an important part of the construction of ontology. In the process of constructing petroleum field ontology, the semantic relationship identification is more difficult because the documents in the petroleum field have their unique characteristics. The current semantic recognition algorithm is mainly based on association rules' recognition algorithm, but there is no field-specific orientation. By analyzing the characteristics of petroleum documents, this study proposes a semantic relationship recognition algorithm for petroleum documents based on improved word vector. Based on the Continuous Bag-Of-Words (CBOW) model, this study carries out expanded model training on petroleum terminologies and introduces negative sampling and subsampling techniques to improve the training accuracy and efficiency. Feature vectors are used in training the Support Vector Mechine (SVM) classifier for semantic relationship recognition. The experimental results show that the word vectors trained by this method can accurately identify the semantic relations contained in documents in the petroleum field and have obvious advantages.
文章编号:     中图分类号:    文献标志码:
基金项目:科技部创新方法工作专项(2015IM01030)
引用文本:
宫法明,朱朋海.基于改进词向量的石油文档语义关系识别.计算机系统应用,2018,27(8):153-158
GONG Fa-Ming,ZHU Peng-Hai.Semantic Relationship Recognition of Oil Documents Based on Improved Word Vector.COMPUTER SYSTEMS APPLICATIONS,2018,27(8):153-158

用微信扫一扫

用微信扫一扫