Problems such as low topic accuracy and poor continuity of topic keywords occur when geological texts are directly clustered by topic models. This study adopts relevant improvement methods. In the word segmentation stage, the repeated word string extraction algorithm based on word frequency statistics is adopted. Geological terms are retained to accurately extract text topics, and redundant word strings are reduced to save memory costs. In this way, the efficiency of retained word extraction is improved. In addition, a text data augmentation algorithm based on term frequency-inverse document frequency (TF-IDF) and word vector is used to process the original word segmentation corpus and thereby strengthen the text topic features. Then, the algorithm is combined with the topic model to extract the corpus topics on the processed corpus. The performance of the model is improved due to its enhanced prior information. The experimental results show that the method combining the proposed algorithm with the latent Dirichlet allocation (LDA) model performs well, superior to other methods in all the related indexes and output results.