###
DOI:
计算机系统应用英文版:2010,19(10):199-202
本文二维码信息
码上扫一扫!
文本搜索排序中构造训练集的一种方法
(中国科学技术大学 自动化系 安徽 合肥 230027)
Construct Training Set for Learning to Rank in Web Search
摘要
图/表
参考文献
相似文献
本文已被:浏览 1572次   下载 3631
Received:January 18, 2010    Revised:February 26, 2010
中文摘要: 在文本搜索领域,用自学习排序的方法构建排序模型越来越普遍。排序模型的性能很大程度上依赖训练集。每个训练样本需要人工标注文档与给定查询的相关程度。对于文本搜索而言,查询几乎是无穷的,而人工标注耗时费力,所以选择部分有信息量的查询来标注很有意义。提出一种同时考虑查询的难度、密度和多样性的贪心算法从海量的查询中选择有信息量的查询进行标注。在LETOR 和从Web搜索引擎数据库上的实验结果,证明利用本文提出的方法能构造一个规模较小且有效的训练集。
Abstract:Learning to rank has become a popular method to build a ranking model for Web search. For the same ranking algorithm, the performance of ranking model depends on a training set. A training sample is constructed by labeling the relevance of a document and a given query by a human. However, the number of queries in Web search is nearly infinite, and the human labeling cost is expensive. Therefore, it is necessary to select a subset of queries to construct an efficient training set. In this paper, a algorithm is developed to select queries by simultaneously taking the query difficulty, density, and diversity into consideration. The experimental results on LETOR and a collected Web search dataset show that the proposed method can lead to a more efficient training set.
文章编号:     中图分类号:    文献标志码:
基金项目:国家高技术研究发展计划(863)(2006AA01Z449)
引用文本:
王黎,帅建梅.文本搜索排序中构造训练集的一种方法.计算机系统应用,2010,19(10):199-202
WANG Li,SHUAI Jian-Mei.Construct Training Set for Learning to Rank in Web Search.COMPUTER SYSTEMS APPLICATIONS,2010,19(10):199-202