文本搜索排序中构造训练集的一种方法

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月26日 10:25 星期六

首页 > 过刊浏览>2010年第19卷第10期 >199-202

PDF HTML阅读 XML下载导出引用引用提醒

文本搜索排序中构造训练集的一种方法
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        王黎王黎
中国科学技术大学 自动化系 安徽 合肥 230027
在期刊界中查找
在百度中查找
在本站中查找
帅建梅帅建梅

在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家高技术研究发展计划(863)(2006AA01Z449)

Construct Training Set for Learning to Rank in Web Search

Author:

WANG Li
WANG Li

在期刊界中查找
在百度中查找
在本站中查找
SHUAI Jian-Mei
SHUAI Jian-Mei

在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

在文本搜索领域，用自学习排序的方法构建排序模型越来越普遍。排序模型的性能很大程度上依赖训练集。每个训练样本需要人工标注文档与给定查询的相关程度。对于文本搜索而言，查询几乎是无穷的，而人工标注耗时费力，所以选择部分有信息量的查询来标注很有意义。提出一种同时考虑查询的难度、密度和多样性的贪心算法从海量的查询中选择有信息量的查询进行标注。在LETOR 和从Web搜索引擎数据库上的实验结果，证明利用本文提出的方法能构造一个规模较小且有效的训练集。

关键词:信息检索;自学习排序;构造训练集

Abstract:

Learning to rank has become a popular method to build a ranking model for Web search. For the same ranking algorithm, the performance of ranking model depends on a training set. A training sample is constructed by labeling the relevance of a document and a given query by a human. However, the number of queries in Web search is nearly infinite, and the human labeling cost is expensive. Therefore, it is necessary to select a subset of queries to construct an efficient training set. In this paper, a algorithm is developed to select queries by simultaneously taking the query difficulty, density, and diversity into consideration. The experimental results on LETOR and a collected Web search dataset show that the proposed method can lead to a more efficient training set.

Key words:information retrieval; learning to rank; construct training set

引用本文

王黎,帅建梅.文本搜索排序中构造训练集的一种方法.计算机系统应用,2010,19(10):199-202

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2010-01-18
最后修改日期:2010-02-26
录用日期:
在线发布日期:
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码