本文已被:浏览 564次 下载 1587次
Received:December 06, 2022 Revised:January 19, 2023
Received:December 06, 2022 Revised:January 19, 2023
中文摘要: 高质量的问答对有助于从文章中获取知识, 提高问答系统性能, 促进机器阅读理解, 在人类活动和人工智能领域中都起着较为重要的作用. 当前主要问答对生成方法依靠提供文章中的候选答案, 根据答案生成特定的问题. 然而一些候选答案可能会生成无法从文章中回答的问题, 或是生成问题的答案不再是候选答案, 造成问答对相关性差, 影响问答对的质量. 针对此问题, 本文提出了一个基于关键短语抽取与过滤生成问答对的方法. 该方法能够在输入文本中自动抽取适合生成问题的关键短语作为候选答案, 再根据候选答案在问题生成器和答案生成器中生成问答对, 并通过对比候选答案与生成答案的相似度过滤相关性低的问答对, 最终输出保证质量的问答对. 本方法在SQUAD1.1和NewsQA数据集上进行了实验验证, 并人工检验了生成的问答对的质量, 结果表明该方法可以有效提高生成的问答对的质量.
Abstract:High-quality question-answering plays an important role in human activities and artificial intelligence because it can help to obtain knowledge from articles, improve the performance of question-answering systems, and promote machine reading comprehension. The current mainstream question-answer pair generation methods usually rely on candidate answers in the provided article to generate specific questions based on these answers. However, some candidate answers may generate questions that cannot be answered from the article, or the answers to the generated questions are no longer the same as the candidate answers, which thus results in a poor correlation of the question-answer pairs and affects the quality of the question-answer pairs. In order to solve these problems, this study proposes a method to generate question-answer pairs based on key phrase extraction and filtering. The method can automatically extract key phrases suitable for generating questions from the input text as the candidate answers and then generate question-answer pairs by a question generator and an answer generator according to the candidate answers. Finally, the method outputs question-answer pairs with high quality by comparing the similarity between the candidate answers and the generated answers and filtering out those question-answer pairs that have a low correlation with the candidate answers. The proposed method is evaluated by experiments on SQUAD1.1 and NewsQA datasets, and the quality of generated question-answer pairs is manually checked. The results show that this method can effectively improve the quality of generated question-answer pairs.
keywords: questions-answer pair candidate answer key phrase extraction T5 model similarity filtering
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(61976053, 62171131); 福建省自然科学基金(2022J01398)
引用文本:
郭峥嵘,郭躬德,王晖.基于关键短语抽取与答案过滤的问答对生成.计算机系统应用,2023,32(6):293-300
GUO Zheng-Rong,GUO Gong-De,WANG Hui.Question-answer Pair Generation Based on Key Phrase Extraction and Answer Filtering.COMPUTER SYSTEMS APPLICATIONS,2023,32(6):293-300
郭峥嵘,郭躬德,王晖.基于关键短语抽取与答案过滤的问答对生成.计算机系统应用,2023,32(6):293-300
GUO Zheng-Rong,GUO Gong-De,WANG Hui.Question-answer Pair Generation Based on Key Phrase Extraction and Answer Filtering.COMPUTER SYSTEMS APPLICATIONS,2023,32(6):293-300