Clustering Short Text Classification Based on Fusion of BERT and GSDMM
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    In the task of text classification, traditional natural language processing methods have limitations in short text classification due to the sparse features and irregular wording of short texts. Considering the characteristics of short texts, this study proposes a classification algorithm based on the fusion of bidirectional encoder representations from Transformers (BERT) and a collapsed Gibbs sampling algorithm for the Dirichlet multinomial mixture model (GSDMM) and clustering guidance to improve the effectiveness and accuracy of short text classification. First, the model converts short texts into integrated semantic vectors by using the fusion model of BERT and GSDMM. The integrated vectors reflect global semantic features and topic features and solve the problems of sparse short text features and the lack of topic information. Then, the clustering guidance algorithm is introduced into the front-end training of the classifier, which realizes the expansion of the labeled data and improves the interpretability of the results. Finally, the expanded labeled data set is used to train the classifier to complete the automatic classification of short texts. Taking the negative comment of an e-commerce platform as the verification data set, this study verifies the effectiveness and advantages of the algorithm in short text classification in multiple groups of comparative experiments.

    Reference
    Related
    Cited by
Get Citation

刘豪,王雨辰. BERT与GSDMM融合的聚类短文本分类.计算机系统应用,2022,31(2):267-272

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:April 06,2021
  • Revised:April 30,2021
  • Online: January 28,2022
Article QR Code
You are the first1015029Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063