Abstract:As a cornerstone of natural language processing, text representation learning has made a great breakthrough in its semantic representation ability when it undergoes the development of the vector space model, word embedding model, and contextual distributed representation. In addition, it directly promotes the continuous improvement of the performance of models in downstream tasks such as machine reading and text retrieval. However, as the most advanced text representation learning method, the pre-trained language model has high space-time complexity in the training and prediction stages, which results in a high threshold of use. Therefore, this study proposes a new text representation learning method based on deep hashing and pre-training, which aims to achieve as high a text representation ability as possible with less computation. The experimental results show that the proposed method can remarkably reduce the computational complexity and to a great extent improve the efficiency of the model in the prediction stage.