Abstract:Similarity matching is crucial for natural language processing and also for extracting answers from the question answering system. This study proposes a model of text similarity matching based on positive and negative samples and Bi-LSTM. Firstly, this model constructs question answering pairs for positive and negative samples in model training, improving the similarity between a question and its correct answer. Secondly, it applies the dual-layer word vector embedding for pre-training to solve the experimental error caused by segmentation mistakes. Thirdly, it adopts the internal attention mechanism before feature extraction to solve the backward offset of the characteristic vectors caused by the attention mechanism. Then this model trains the data on the Bi-LSTM neural network to retain important temporal characteristics. Finally, it puts forward a similarity calculation function including semantic information to calculate similarity at the semantic level. The model proposed in this study is simulated on the public data set DuReader and compared with other models. The experimental results show that the proposed model has high accuracy and good robustness, and the accuracy of top-1 reaches 78.34%.