Construction of Discipline Knowledge Graph for Multi-Source Heterogeneous Data Sources
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [22]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    It is difficult to count the discipline information stored in a scattered form. With regard to this problem, based on the domain ontology model of computer discipline, the computer discipline knowledge graph in universities is constructed by integrating the multi-source and heterogeneous data. First, domain knowledge is acquired from relevant websites and existing documents through Web crawlers and other tools, and the data are cleaned on the basis of the BERT model. Then, Word2Vec is used to judge the similarity between the research directions of characters, so as to solve the problem about entity alignment. Finally, the data are imported into the Neo4j graph database to realize the storage of knowledge. According to the knowledge graph, the visualization system of computer discipline is established, which can fulfil information retrieval, graphic display, and other functions and realize quick query and resource statistics of computer discipline data. It is expected to facilitate the follow-up discipline evaluation work and make it more efficient.

    Reference
    [1] 黎晓玲. 教育部学科评估指标变迁及启示. 大学教育, 2020, (5): 1–3. [doi: 10.3969/j.issn.2095-3437.2020.05.001
    [2] 李涛, 王次臣, 李华康. 知识图谱的发展与构建. 南京理工大学学报, 2017, 41(1): 22–34. [doi: 10.14177/j.cnki.32-1397n.2017.41.01.004
    [3] Barisevičius G, Coste M, Geleta D, et al. Supporting digital healthcare services using semantic Web technologies. Proceedings of the 17th International Semantic Web Conference. Cham: Springer, 2018. 291–306.
    [4] 乔钢柱, 冯婷婷, 张国晨. 基于知识图谱的盗窃案件法律文书智能推理研究. 计算机系统应用, 2019, 28(7): 206–213. [doi: 10.15888/j.cnki.csa.006974
    [5] 刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述. 计算机研究与发展, 2016, 53(3): 582–600. [doi: 10.7544/issn1000-1239.2016.20148228
    [6] 章勇, 吕俊白. 基于Protege的本体建模研究综述. 福建电脑, 2011, 27(1): 43–45. [doi: 10.3969/j.issn.1673-2782.2011.01.021
    [7] 杨玉基, 许斌, 胡家威, 等. 一种准确而高效的领域知识图谱构建方法. 软件学报, 2018, 29(10): 2931–2947. [doi: 10.13328/j.cnki.jos.005552
    [8] 谢克武. 大数据环境下基于Python的网络爬虫技术. 电子制作, 2017, (9): 44–45. [doi: 10.16589/j.cnki.cn11-3571/tn.2017.09.017
    [9] Fang T, Han T, Zhang C, et al. Research and construction of the online pesticide information center and discovery platform based on Web crawler. Procedia Computer Science, 2020, 166: 9–14. [doi: 10.1016/j.procs.2020.02.004
    [10] Kim Y. Convolutional neural networks for sentence classification. arXiv: 1408.5882, 2014.
    [11] 刘春磊, 武佳琪, 檀亚宁. 基于TextCNN的用户评论情感极性判别. 电子世界, 2019, (3): 48, 50. [doi: 10.19353/j.cnki.dzsj.2019.03.020
    [12] 余传明, 王曼怡, 林虹君, 等. 基于深度学习的词汇表示模型对比研究. 数据分析与知识发现, 2020, 4(8): 28–40
    [13] Jwa H, Oh D, Park K, et al. exBAKE: Automatic fake news detection model based on Bidirectional Encoder Representations from Transformers (BERT). Applied Sciences, 2019, 9(19): 4062. [doi: 10.3390/app9194062
    [14] Li XY, Zhang H, Zhou XH. Chinese clinical named entity recognition with variant neural structures based on BERT methods. Journal of Biomedical Informatics, 2020, 107(5): 103422. [doi: 10.1016/j.jbi.2020.103422
    [15] Lee JS, Hsiang J. Patent classification by fine-tuning BERT language model. World Patent Information, 2020, 61: 101965. [doi: 10.1016/j.wpi.2020.101965
    [16] 赵旸, 张智雄, 刘欢, 等. 基于BERT模型的中文医学文献分类研究. 数据分析与知识发现, 2020, 4(8): 41–49
    [17] Sharma AK, Chaurasia S, Srivastava DK. Sentimental short sentences classification by using CNN deep learning model with fine tuned Word2Vec. Procedia Computer Science, 2020, 167: 1139–1147. [doi: 10.1016/j.procs.2020.03.416
    [18] 罗钰敏, 刘丹, 尹凯, 等. 加权平均Word2Vec实体对齐方法. 计算机工程与设计, 2019, 40(7): 1927–1933. [doi: 10.16208/j.issn1000-7024.2019.07.021
    [19] Sun YH, Sarwat M. A spatially-pruned vertex expansion operator in the Neo4j graph database system. Geoinformatica, 2019, 23(3): 397–423. [doi: 10.1007/s10707-019-00361-2
    [20] 崔蓬. ECharts在数据可视化中的应用. 软件工程, 2019, 22(6): 42–46. [doi: 10.19644/j.cnki.issn2096-1472.2019.06.011
    [21] 王鑫, 傅强, 王林, 等. 知识图谱可视化查询技术综述. 计算机工程, 2020, 46(6): 1–11. [doi: 10.19678/j.issn.1000-3428.0057669
    [22] 唐琳, 郭崇慧, 陈静锋. 中文分词技术研究综述. 数据分析与知识发现, 2020, 4(2): 1–17. [doi: 10.11925/infotech.2096-3467.2019.1059
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

李家瑞,李华昱,闫阳.面向多源异质数据源的学科知识图谱构建方法.计算机系统应用,2021,30(10):59-67

Copy
Share
Article Metrics
  • Abstract:1069
  • PDF: 3221
  • HTML: 3238
  • Cited by: 0
History
  • Received:January 11,2021
  • Revised:February 23,2021
  • Online: October 08,2021
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063