Text Case Classification of Safety Production Accidents Based on Word2Vec Word Embedding and Clustering Model
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [16]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    The analysis of safety production accidents is of great significance to the improvement of emergency management ability. Based on the semantic analysis of safety production cases, Word2Vec embedding technology and clustering model are used, CBOW + negative sampling technology is used to realize word vector, and the data characteristics of safety production accident cases classification are combined, through semi-supervised learning based clustering model algorithm, according to the characteristics of the accident nature, an optimized initial clustering center algorithm is proposed, and K-means clustering algorithm is used to classify the text cases of safety accidents. The experimental results show that the proposed method can realize the classification of accident cases, and can be used for reference in the multi-dimensional analysis of accident.

    Reference
    [1] 易高翔, 魏利军, 吴宗之, 等. 全国安全生产调查信息系统设计与实现. 中国安全生产科学技术, 2009, 5(4): 60–63
    [2] Rong X. Word2Vec parameter learning explained. arXiv preprint arXiv: 1411.2738, 2014.
    [3] Zhang W, Qu CF, Ma L, et al. Learning structure of stereoscopic image for no- reference quality assessment with convolutional neural network. Pattern Recognition, 2016, 59: 176–187.
    [4] Mansoor HH, Shaker SH. Using classiffcation technique to SMS spam filter. International Journal of Innovative Technology and Exploring Engineering, 2019, 10(8): 56–62
    [5] 李金洪. 深度学习之TensorFlow入门、原理与进阶实战. 北京: 机械工业出版社, 2019. 279–296.
    [6] 李孟全. TensorFlow与自然语言处理应用. 北京: 清华大学出版社, 2019. 77–120.
    [7] 杨楠, 李亚平. 基于Word2Vec模型特征扩展的Web搜索结果聚类性能的改进. 计算机应用, 2019, 39(6): 1701–1706. [doi: 10.11772/j.issn.1001-9081.2018102106
    [8] 蒋振超, 李丽双, 黄德根. 基于词语关系的词向量模型. 中文信息学报, 2017, 31(3): 25–31
    [9] 孙佳伟, 李正华, 陈文亮, 等. 基于词模式嵌入的词语上下位关系分类. 北京大学学报(自然科学版), 2019, 55(1): 1–7
    [10] Rubin TN, Chambers A, Smyth P, et al. Statistical topic models for multi-label document classification. Machine Learning, 2012, 88(1–2): 157–208. [doi: 10.1007/s10994-011-5272-5
    [11] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, NV, USA. 2013. 3111–3119.
    [12] Zheng XQ, Chen HY, Xu TY. Deep learning for Chinese word segmentation and POS tagging. Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, WA, USA. 2013. 647–657.
    [13] 张克君, 史泰猛, 李伟男, 等. 基于统计语言模型改进的Word2Vec优化策略研究. 中文信息学报, 2019, 33(7): 11–19. [doi: 10.3969/j.issn.1003-0077.2019.07.002
    [14] 王千, 王成, 冯振元, 等. K-means聚类算法研究综述. 电子设计工程, 2012, 20(7): 21–24. [doi: 10.3969/j.issn.1674-6236.2012.07.008
    [15] 周志华. 机器学习. 北京: 清华大学出版社, 2016. 197–224.
    [16] 周爱武, 于亚飞. K-Means聚类算法的研究. 计算机技术与发展, 2011, 21(2): 62–65. [doi: 10.3969/j.issn.1673-629X.2011.02.016
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

吴德平,华钢.基于Word2Vec词嵌入和聚类模型的安全生产事故文本案例分类.计算机系统应用,2021,30(1):141-145

Copy
Share
Article Metrics
  • Abstract:891
  • PDF: 2348
  • HTML: 1507
  • Cited by: 0
History
  • Received:May 04,2020
  • Revised:June 10,2020
  • Online: December 31,2020
Article QR Code
You are the first992285Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063