Abstract:Automatic text summarization is an important branch in the field of natural language processing (NLP), and one of its main difficulties lies in how to evaluate the quality of the generated summaries quickly, objectively, and accurately. Given the problems of low evaluation accuracy, the need for reference texts, and the large consumption of computing resources in the existing text summary quality evaluation methods, this study proposes an evaluation method for the quality of text summaries based on large language models. It designs a prompt construction method based on the principle of the chain of thought to improve the performance of large language models in the evaluation of text summary quality. At the same time, a chain of thought data set is generated and a small large language model is trained in the way of model fine-tuning, significantly reducing the computing requirements. The proposed method first determines the evaluation dimension according to the characteristics of the text summary and constructs the prompt based on the principle of chain of thought. The prompt is utilized to guide the large language model to generate the chain of thought process and evaluation results based on the summary samples. Accordingly, a chain of thought data set is generated. The generated chain of thought data set is used to fine-tune and train the small large language model. Finally, the study uses the fine-tuned small-scale large language model to complete the quality evaluation of the text summary. Comparative experiments and analyses on the Summeval dataset show that this evaluation method significantly improves the evaluation accuracy of the small-scale large language model in the task of text summary quality evaluation. The study provides a text summary quality evaluation method, which is a method with high evaluation accuracy, low computing requirements, and easy deployment without reference texts.