###

DOI:

计算机系统应用英文版:2010,19(10):199-202

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

文本搜索排序中构造训练集的一种方法

王黎¹, 帅建梅

(中国科学技术大学自动化系安徽合肥 230027)

Construct Training Set for Learning to Rank in Web Search

摘要

图/表

参考文献

相似文献

本文已被：浏览 1781次下载 4049次
Received:January 18, 2010 Revised:February 26, 2010

中文摘要: 在文本搜索领域，用自学习排序的方法构建排序模型越来越普遍。排序模型的性能很大程度上依赖训练集。每个训练样本需要人工标注文档与给定查询的相关程度。对于文本搜索而言，查询几乎是无穷的，而人工标注耗时费力，所以选择部分有信息量的查询来标注很有意义。提出一种同时考虑查询的难度、密度和多样性的贪心算法从海量的查询中选择有信息量的查询进行标注。在LETOR 和从Web搜索引擎数据库上的实验结果，证明利用本文提出的方法能构造一个规模较小且有效的训练集。

中文关键词: 信息检索自学习排序构造训练集

Abstract:Learning to rank has become a popular method to build a ranking model for Web search. For the same ranking algorithm, the performance of ranking model depends on a training set. A training sample is constructed by labeling the relevance of a document and a given query by a human. However, the number of queries in Web search is nearly infinite, and the human labeling cost is expensive. Therefore, it is necessary to select a subset of queries to construct an efficient training set. In this paper, a algorithm is developed to select queries by simultaneously taking the query difficulty, density, and diversity into consideration. The experimental results on LETOR and a collected Web search dataset show that the proposed method can lead to a more efficient training set.

keywords: information retrieval learning to rank construct training set

文章编号： 中图分类号： 文献标志码：

基金项目:国家高技术研究发展计划(863)(2006AA01Z449)

Author Name	Affiliation
WANG Li	中国科学技术大学自动化系安徽合肥 230027
SHUAI Jian-Mei

Author Name	Affiliation
WANG Li	中国科学技术大学自动化系安徽合肥 230027
SHUAI Jian-Mei

引用文本：
王黎,帅建梅.文本搜索排序中构造训练集的一种方法.计算机系统应用,2010,19(10):199-202
WANG Li,SHUAI Jian-Mei.Construct Training Set for Learning to Rank in Web Search.COMPUTER SYSTEMS APPLICATIONS,2010,19(10):199-202