Abstract:Text matching is one of the key techniques in natural language understanding, and its task is to determine the similarity of two texts. In recent years, with the development of pre-trained models, text-matching techniques based on pre-trained language models have been widely used. However, these text matching models still face the challenges of poor generalization ability in a particular domain and weak robustness in semantic matching. Therefore, this study proposes an incremental pre-training and adversarial training method for low-frequency words to improve the effect of the text matching model. The incremental pre-training of low-frequency words in the source domain helps the model migrate to the target domain and enhances the generalization ability of the model. Additionally, various adversarial training methods for low-frequency words are tried to improve the model’s adaptability to word-level perturbations and the robustness of the model. The experimental results on the LCQMC dataset and the text-matching dataset in the real estate domain indicate that incremental pre-training, adversarial training, and the combination of the two approaches can significantly improve the text matching results.