本文已被:浏览 1322次 下载 2343次
Received:March 11, 2015 Revised:April 15, 2015
Received:March 11, 2015 Revised:April 15, 2015
中文摘要: 通用搜索引擎在检索过程中会出现查询结果与关键词所属领域无关的主题漂移现象.本文提出了面向特定领域的网页重排序算法-TSRR(Topic Sensitive Re-Ranking)算法,从一个新的视角对主题漂移问题加以解决. TSRR算法设计一种独立于网页排序的模型,用来表示领域,然后建立网页信息模型,在用户检索过程中结合领域向量模型和网页信息模型对网页搜索结果进行重排序.在爬取的特定领域的数据集上,以用户满意度和准确率为标准进行评估,实验结果表明,本文中提出的TSRR算法性能优异,比经典的基于Lucene的排序算法在用户满意度上平均提高17.3%,在准确率上平均提高41.9%.
Abstract:General search engines often cause the topic-drift problem, which means that during the retrieval process, some of the retrieval results are independent to the domain keywords. We propose a web page re-ranking algorithm for a specific domain-the TSRR(Topic Sensitive Re-Ranking) algorithm to solve the problem from a specific perspective. TSRR establishes a vector model which is independent to page rank for a specific domain and a web page information model; then it combines the vector model and the web page information model to re-rank the search results in the retrieval process. TSRR's performance is evaluated based on the criteria of customer satisfaction and precision. Experiment results on the dataset crawled for specific domains show that TSRR is excellent in performance. Compared with the ranking algorithm from Lucene, TSRR can promote the customer satisfaction performance by 17.3% and the precision performance by 41.9% on average.
keywords: domain model web information model re-ranking
文章编号: 中图分类号: 文献标志码:
基金项目:国家高技术研究发展计划(863)(2012AA011005)
引用文本:
潘澄,吴共庆,李磊,胡学钢.基于领域模型的网页搜索排序算法.计算机系统应用,2015,24(11):107-114
PAN Cheng,WU Gong-Qing,LI Lei,HU Xue-Gang.Web Page Re-Ranking Algorithm for Specific Domain Based on Domain Model.COMPUTER SYSTEMS APPLICATIONS,2015,24(11):107-114
潘澄,吴共庆,李磊,胡学钢.基于领域模型的网页搜索排序算法.计算机系统应用,2015,24(11):107-114
PAN Cheng,WU Gong-Qing,LI Lei,HU Xue-Gang.Web Page Re-Ranking Algorithm for Specific Domain Based on Domain Model.COMPUTER SYSTEMS APPLICATIONS,2015,24(11):107-114