###

计算机系统应用英文版:2019,28(4):139-144

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

基于频繁模式的长尾文本聚类算法

宋中山, 张广凯, 尹帆, 帖军

(中南民族大学计算机科学学院, 武汉 430074)

Long Tail Text Clustering Algorithm Based on Frequent Patterns

SONG Zhong-Shan, ZHANG Guang-Kai, YIN Fan, TIE Jun

(School of Computer Science, South-Central University for Nationalities, Wuhan 430074, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 2033次下载 2189次
Received:October 15, 2018 Revised:October 31, 2018

中文摘要: 短文本聚类一直是信息提取领域的热门话题，大规模的短文本数据中存在“长尾现象”，传统算法对其聚类时会面临特征纬度高，小类别信息丢失的问题，针对对上述问题的研究，本文提出一种频繁项协同剪枝迭代聚类算法（Frequent Itemsets collaborative Pruning iteration Clustering framework，FIPC）.该算法将迭代聚类框架与K中心点算法相结合，运用协同剪枝策略，实现对小类别文本聚类，实验结果证明该聚类算法能够有效的提高小类别短文本信息聚类的精确度，并能避免聚类中类簇重叠的问题.

中文关键词: 文本聚类长尾现象频繁模式 K中心点算法

Abstract:Short texts clustering is a popular topic in the field of information extraction. There is a "long tail phenomenon" when the scale of data is large, which causes high dimensions of features and information loss of small class. To solve these problems, this study proposes a Frequent Itemsets collaborative Pruning iteration Clustering framework (FIPC). This framework combines the iterative clustering framework with the K-mediods algorithm, using the collaborative pruning strategy to cluster text of small class. The result of experiments shows that the FIPC framework can achieve text clustering of small class with high accuracy, and avoid the problem of overlapping clusters.

keywords: text clustering long tail phenomenon frequent mode K-mediods algorithm

文章编号： 中图分类号： 文献标志码：

基金项目:国家科技支撑计划项目子课题（2015BAD29B01）；农业部软科学研究课题（D201721）；中央高校基本科研业务费专项资金（CZY18016）

引用文本：
宋中山,张广凯,尹帆,帖军.基于频繁模式的长尾文本聚类算法.计算机系统应用,2019,28(4):139-144
SONG Zhong-Shan,ZHANG Guang-Kai,YIN Fan,TIE Jun.Long Tail Text Clustering Algorithm Based on Frequent Patterns.COMPUTER SYSTEMS APPLICATIONS,2019,28(4):139-144

Author Name	Affiliation	E-mail
SONG Zhong-Shan	School of Computer Science, South-Central University for Nationalities, Wuhan 430074, China
ZHANG Guang-Kai	School of Computer Science, South-Central University for Nationalities, Wuhan 430074, China
YIN Fan	School of Computer Science, South-Central University for Nationalities, Wuhan 430074, China
TIE Jun	School of Computer Science, South-Central University for Nationalities, Wuhan 430074, China	tiejun@mail.scuec.edu.cn