Incremental Text Clustering Algorithm for Hot Topic Detection
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    As the traditional Single-Pass clustering algorithm is highly sensitive to the input sequence of data and has low accuracy, an incremental text clustering algorithm (SP-HTD) is proposed, which takes subtopics as granularity and considers the dynamics, timeliness, and contextual semantic features of news texts. Firstly, by parsing the LDA2Vec topic model, this study jointly trains the document vectors and the word vectors to obtain the context vectors and thus fully mines the semantic features and importance relationship of the text. Then, on the basis of the Single-Pass algorithm, sub-topics are classified according to the extracted hot topic feature words, and the time threshold is set to confirm the timeliness of the cluster center. The mined semantic features and tasks are combined to dynamically update the cluster center. Finally, with the assistance of the time characteristics, the centroid vectors of the topics are updated to improve the accuracy of text similarity calculation. The results reveal that the F value of the proposed method can reach up to 89.3%, and on the premise of ensuring the clustering accuracy, the proposed method has a significantly lower undetected rate and false detection rate compared with those of the traditional algorithm, and thus it can effectively improve the accuracy of topic detection.

    Reference
    Related
    Cited by
Get Citation

郭莹,薛涛,胡伟华.面向热点话题检测的增量文本聚类算法.计算机系统应用,2022,31(9):280-286

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 07,2021
  • Revised:January 04,2022
  • Adopted:
  • Online: July 07,2022
  • Published:
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063