###
计算机系统应用英文版:2021,30(10):59-67
←前一篇   |   后一篇→
本文二维码信息
码上扫一扫!
面向多源异质数据源的学科知识图谱构建方法
(中国石油大学(华东) 计算机科学与技术学院, 青岛 266580)
Construction of Discipline Knowledge Graph for Multi-Source Heterogeneous Data Sources
(College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 977次   下载 2969
Received:January 11, 2021    Revised:February 23, 2021
中文摘要: 针对以分散形式存储学科信息导致资源难以统计的问题, 基于计算机学科领域本体模型, 融合多源异质的学科数据构建高校计算机学科知识图谱. 首先通过网络爬虫等技术从相关网站和已有文档中获取领域知识, 并基于BERT模型对数据进行清洗; 然后利用Word2Vec判断人物研究方向之间的相似度, 解决实体对齐问题; 最终将数据导入Neo4j图数据库中实现知识的存储. 根据构建好的知识图谱建立计算机学科可视化系统, 能够提供信息检索与图形显示等多种功能, 实现计算机学科基础数据的快捷查询和资源统计, 以期促进后续的学科评估工作更加高效地完成.
Abstract:It is difficult to count the discipline information stored in a scattered form. With regard to this problem, based on the domain ontology model of computer discipline, the computer discipline knowledge graph in universities is constructed by integrating the multi-source and heterogeneous data. First, domain knowledge is acquired from relevant websites and existing documents through Web crawlers and other tools, and the data are cleaned on the basis of the BERT model. Then, Word2Vec is used to judge the similarity between the research directions of characters, so as to solve the problem about entity alignment. Finally, the data are imported into the Neo4j graph database to realize the storage of knowledge. According to the knowledge graph, the visualization system of computer discipline is established, which can fulfil information retrieval, graphic display, and other functions and realize quick query and resource statistics of computer discipline data. It is expected to facilitate the follow-up discipline evaluation work and make it more efficient.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61572522); 中国石油大学(华东)研究生创新工程(YCX2021128)
引用文本:
李家瑞,李华昱,闫阳.面向多源异质数据源的学科知识图谱构建方法.计算机系统应用,2021,30(10):59-67
LI Jia-Rui,LI Hua-Yu,YAN Yang.Construction of Discipline Knowledge Graph for Multi-Source Heterogeneous Data Sources.COMPUTER SYSTEMS APPLICATIONS,2021,30(10):59-67