Abstract:Short text classification is one of the hotspots of research in natural language processing. A new model of text representation is proposed in this study (N-of-DOC), and in order to solve the problem of sparse representation in Chinese, the Word2Vec distributed representation is used, finally, it is applied to the improved Convolution Neural Network (CNN) model to extract the high level features from the filter layer, the classification model is obtained by connecting the Softmax classifier after the pooling layer. In the experiment, the traditional text representation model and the improved text representation model are used as the input of the original data, respectively. It acts on the model of traditional machine learning (KNN, SVM, logistic regression, naive Bayes) and the improved CNN model. The results show that the proposed method can not only solve the dimension disaster and sparse problem of Chinese text vectors, but also improve the classification accuracy by 4.23% compared with traditional methods.