Abstract:Learning to rank has become a popular method to build a ranking model for Web search. For the same ranking algorithm, the performance of ranking model depends on a training set. A training sample is constructed by labeling the relevance of a document and a given query by a human. However, the number of queries in Web search is nearly infinite, and the human labeling cost is expensive. Therefore, it is necessary to select a subset of queries to construct an efficient training set. In this paper, a algorithm is developed to select queries by simultaneously taking the query difficulty, density, and diversity into consideration. The experimental results on LETOR and a collected Web search dataset show that the proposed method can lead to a more efficient training set.