###
计算机系统应用英文版:2021,30(7):204-209
本文二维码信息
码上扫一扫!
基于对抗式数据增强的深度文本检索重排序
(东北大学 理学院, 沈阳 110819)
Deep Text Retrieval Re-Ranking Based on Adversarial Data Augmentation
(School of Science, Northeastern University, Shenyang 110819, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 898次   下载 1797
Received:November 09, 2020    Revised:December 12, 2020
中文摘要: 在信息检索领域的排序任务中, 神经网络排序模型已经得到广泛使用. 神经网络排序模型对于数据的质量要求极高, 但是, 信息检索数据集通常含有较多噪音, 不能精确得到与查询不相关的文档. 为了训练一个高性能的神经网络排序模型, 获得高质量的负样本, 则至关重要. 借鉴现有方法doc2query的思想, 本文提出了深度、端到端的模型AQGM, 通过学习不匹配查询文档对, 生成与文档不相关、原始查询相似的对抗查询, 增加了查询的多样性, 增强了负样本的质量. 本文利用真实样本和AQGM模型生成的样本, 训练基于BERT的深度排序模型, 实验表明, 与基线模型BERT-base对比, 本文的方法在MSMARCO和TrecQA数据集上, MRR指标分别提升了0.3%和3.2%.
Abstract:The neural network ranking model has been widely used in the ranking task of the information retrieval field. It requires extremely high data quality; however, the information retrieval datasets usually contain a lot of noise, and documents irrelevant to the query cannot be accurately obtained. High-quality negative samples are essential to training a high-performance neural network ranking model. Inspired by the existing doc2query method, we propose a deep and end-to-end model AQGM. This model increases the diversity of queries and enhances the quality of negative samples by learning mismatched query document pairs and generating adversarial queries irrelevant to the documents and similar to the original query. Then, we train a deep ranking model based on BERT with the real samples and the samples generated by the AQGM model. Compared with the baseline model BERT-base, our model improves the MRR index by 0.3% and 3.2% on the MSMARCO and TrecQA datasets, respectively.
文章编号:     中图分类号:    文献标志码:
基金项目:
引用文本:
陈丽萍,任俊超.基于对抗式数据增强的深度文本检索重排序.计算机系统应用,2021,30(7):204-209
CHEN Li-Ping,REN Jun-Chao.Deep Text Retrieval Re-Ranking Based on Adversarial Data Augmentation.COMPUTER SYSTEMS APPLICATIONS,2021,30(7):204-209