Abstract:High dimensional data and noise have always been the major factors affecting the accuracy of text classification. Feature selection and feature extraction is the main methods of dimensionality reduction and denoising. In this paper, the words probability distribution variance and document distribution variance is used to improve the TF-IDF feature selection method (VAR-TF-IDF). After selecting good features, it tuned the CBOW+HS frame work of word2vec. The superposition of word embedding of the selected words is used as eigenvector which could improve accuracy of text classification. Experiment shows the proposed method is effective.