Abstract:Word vector representation is a sound way to catch the grammatical and semantic information of words. In order to improve the accuracy of the semantic information of the word, this study proposes an improved training method model based on the GloVe by analyzing the characteristics of the co-occurrence matrix and using the distributed hypothesis. This method summarizes the general rules of irrelevant words and noise words in the co-occurrence matrix from analyzing the word frequency of Wikipedia statistics. Finally, we give the evaluation results of word vector in word analogy dataset and word correlation dataset. Experiments show that the method presented in this paper can effectively shorten the training time and the accuracy of the word semantic analogy experiment is improved in the same experimental environment.