Abstract:Short texts clustering is a popular topic in the field of information extraction. There is a "long tail phenomenon" when the scale of data is large, which causes high dimensions of features and information loss of small class. To solve these problems, this study proposes a Frequent Itemsets collaborative Pruning iteration Clustering framework (FIPC). This framework combines the iterative clustering framework with the K-mediods algorithm, using the collaborative pruning strategy to cluster text of small class. The result of experiments shows that the FIPC framework can achieve text clustering of small class with high accuracy, and avoid the problem of overlapping clusters.