Short Text Clustering Algorithm with Improved Feature Weight

doi:10.15888/j.cnki.csa.006554

AIPUB归智期刊联盟

WeChat

Mobile website

2025-4-14- 19

Home > Archive>Volume 27, Issue 9, 2018 >210-214. DOI:10.15888/j.cnki.csa.006554

PDF HTML XML Export Cite reminder

Short Text Clustering Algorithm with Improved Feature Weight
DOI:
                        10.15888/j.cnki.csa.006554
                    
CSTR:
                        [cstr]
                    
Author:
                        MA CunMA Cun
University of Chinese Academy of Sciences, Beijing 100049, China;Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110168, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
GUO Rui-FengGUO Rui-Feng
Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110168, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
GAO CenGAO Cen
Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110168, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
SUN YongSUN Yong
Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110168, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Short text research has been a hot topic in the field of natural language processing. Due to the sparseness of short texts and serious colloquialisms, its clustering model has the problems of high dimensionality, poor focus of theme, and unclear semantic information. In view of the above problems, this study proposes a short text clustering algorithm with improving the feature weight. Firstly, the rules of multi-factor weight are defined, the comprehensive evaluation function is constructed based on part-of-speech and symbolic sentiment analysis, and the feature words are selected according to the relevancy between the term and the text content. Then, a word skip vector model (continuous skip-gram model) trained in large-scale corpus to obtain a word vector representing the semantic meaning of the feature words. Finally, the RWMD algorithm is used to calculate the similarity between short texts and the K-means algorithm is used to cluster them. The clustering results on the three test sets show that the algorithm effectively improves the accuracy of short text clustering.

Key words:feature weight;emotion analysis;word vector;RWMD distance

Get Citation

马存,郭锐锋,高岑,孙咏.改进特征权重的短文本聚类算法.计算机系统应用,2018,27(9):210-214

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:January 27,2018
Revised:March 07,2018
Adopted:
Online: August 17,2018
Published:

Article QR Code

You are the first991210Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063