Abstract:With the increase of data dimension, the traditional clustering algorithm will have poor clustering performance. SubKMeans is a powerful subspace clustering algorithm, which aims to search the best subspace for K-Means algorithm and reduce the impact of high dimensions. However, the algorithm requires users to specify the number of clusters K value in advance, and sometimes it can not give accurate K value in actual use. In order to solve this problem, the pairwise constraint is introduced, which is combined with the silhouette coefficient. A SubKMeans algorithm for determining the number of clusters based on the pairwise constraint is proposed. The improved silhouette coefficient can evaluate the clustering performance more accurately, so that the K value can be determined. The experimental results proves the effectiveness of the proposed method.