基于关键短语抽取与答案过滤的问答对生成
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(61976053, 62171131); 福建省自然科学基金(2022J01398)


Question-answer Pair Generation Based on Key Phrase Extraction and Answer Filtering
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 增强出版
  • |
  • 文章评论
    摘要:

    高质量的问答对有助于从文章中获取知识, 提高问答系统性能, 促进机器阅读理解, 在人类活动和人工智能领域中都起着较为重要的作用. 当前主要问答对生成方法依靠提供文章中的候选答案, 根据答案生成特定的问题. 然而一些候选答案可能会生成无法从文章中回答的问题, 或是生成问题的答案不再是候选答案, 造成问答对相关性差, 影响问答对的质量. 针对此问题, 本文提出了一个基于关键短语抽取与过滤生成问答对的方法. 该方法能够在输入文本中自动抽取适合生成问题的关键短语作为候选答案, 再根据候选答案在问题生成器和答案生成器中生成问答对, 并通过对比候选答案与生成答案的相似度过滤相关性低的问答对, 最终输出保证质量的问答对. 本方法在SQUAD1.1和NewsQA数据集上进行了实验验证, 并人工检验了生成的问答对的质量, 结果表明该方法可以有效提高生成的问答对的质量.

    Abstract:

    High-quality question-answering plays an important role in human activities and artificial intelligence because it can help to obtain knowledge from articles, improve the performance of question-answering systems, and promote machine reading comprehension. The current mainstream question-answer pair generation methods usually rely on candidate answers in the provided article to generate specific questions based on these answers. However, some candidate answers may generate questions that cannot be answered from the article, or the answers to the generated questions are no longer the same as the candidate answers, which thus results in a poor correlation of the question-answer pairs and affects the quality of the question-answer pairs. In order to solve these problems, this study proposes a method to generate question-answer pairs based on key phrase extraction and filtering. The method can automatically extract key phrases suitable for generating questions from the input text as the candidate answers and then generate question-answer pairs by a question generator and an answer generator according to the candidate answers. Finally, the method outputs question-answer pairs with high quality by comparing the similarity between the candidate answers and the generated answers and filtering out those question-answer pairs that have a low correlation with the candidate answers. The proposed method is evaluated by experiments on SQUAD1.1 and NewsQA datasets, and the quality of generated question-answer pairs is manually checked. The results show that this method can effectively improve the quality of generated question-answer pairs.

    参考文献
    相似文献
    引证文献
引用本文

郭峥嵘,郭躬德,王晖.基于关键短语抽取与答案过滤的问答对生成.计算机系统应用,2023,32(6):293-300

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-12-06
  • 最后修改日期:2023-01-19
  • 录用日期:
  • 在线发布日期: 2023-04-25
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号