Abstract:The feature extraction and the vector representation are the key points in document classification. In this paper, we propose a classification method based on word2vec for the two key points. This method builds the bag of feature words by Document Frequency (DF) to retain the important feature of the document as much as possible. It takes advantage of the Latent Semantic Analysis of word2vec thus to reduce the size of bag of feature words and the dimension of document vector effectively, which replaces the semantically relevant words with the product of a topic word and proper parameters. Besides, it also gives each feature word the optimal weight by combining with the TF-IDF algorithm. Finally, compared with two other document classification methods, the method presented in this paper has made some significant progress, and the experimental result has proved its effectiveness.