Web Page Re-Ranking Algorithm for Specific Domain Based on Domain Model
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [16]
  • |
  • Related [20]
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    General search engines often cause the topic-drift problem, which means that during the retrieval process, some of the retrieval results are independent to the domain keywords. We propose a web page re-ranking algorithm for a specific domain-the TSRR(Topic Sensitive Re-Ranking) algorithm to solve the problem from a specific perspective. TSRR establishes a vector model which is independent to page rank for a specific domain and a web page information model; then it combines the vector model and the web page information model to re-rank the search results in the retrieval process. TSRR's performance is evaluated based on the criteria of customer satisfaction and precision. Experiment results on the dataset crawled for specific domains show that TSRR is excellent in performance. Compared with the ranking algorithm from Lucene, TSRR can promote the customer satisfaction performance by 17.3% and the precision performance by 41.9% on average.

    Reference
    1 Haveliwala TH. Topic-sensitive pagerank. Proc. of the 11th International Conference on World Wide Web. ACM, 2002:517-526.
    2 Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 1998, 30(1):107-117.
    3 Wu XD, Kumar V, Quinlan J R, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Angus Ng, Liu B, Yu PS, Zhou ZH, Steinbach M, Hand DJ, Steinberg D. Top 10 algorithms in data mining. Knowledge and Information Systems, 2008, 14(1):1-37.
    4 Richardson M, Domingos P. The intelligent surfer:Probabilistic Combination of Link and Content Information in PageRank NIPS. 2001:1441-1448.
    5黄德才,戚华春,钱能.基于主题相似度模型的TS-PageRank算法.小型微型计算机系统,2007,28(3):510-514.
    6 张贤,周娅.基于Lucene网页排序算法的改进.计算机系统应用,2009,10:155-158.
    7刘菁菁,林鸿飞,赵晶.基于PageRank和锚文本的网页排序研究.计算机工程与应用,2007,43(10):170-173.
    8蒋建中,丁宝琼,吴琼,邱文武.基于页面分块的网页排序算法:BHITS.计算机工程,2010,36(11):64-69.
    9刘凯鹏,方滨兴.一种基于社会性标注的网页排序算法.计算机学报,2010,33(6):1014-1023.
    10 龙文明,彭敦陆,姜兴隆.一种基于用户角色的综合网页排序算法.计算机工程,2011,37(7):53-55.
    11 毕硕本,曾晓文,马燕.基于相似度的快速网页排序算法. 科学技术与工程,2014,14(13):67-70.
    12 王冲,曹姗姗.基于用户反馈与主题关联度的网页排序算法改进.计算机应用,2014,34(12):3502-3506.
    13 闫泼,马军,陈竹敏.面向主题的网页排序算法研究.第三届全国信息检索与内容安全学术会议论文集.2007-11,江苏苏州.2007.521-527.
    14 王晓宇,周傲英.万维网的链接结构分析及其应用综述. 软件学报,2003,14(10):1768-1780.
    15 于楠,朱靖波,陈文亮.领域知识库的构建机制.第二届全国学生计算语言学研讨会论文集.北京,2004.
    16 Glover EJ, Tsioutsiouliklis K, Lawrence S, Pennock DM, Flake GW. Using web structure for classifying and describing web pages. Proc. of the 11th international conference on World Wide Web. Honolulu, Hawaii. ACM Press. 2002.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

潘澄,吴共庆,李磊,胡学钢.基于领域模型的网页搜索排序算法.计算机系统应用,2015,24(11):107-114

Copy
Share
Article Metrics
  • Abstract:1375
  • PDF: 2459
  • HTML: 0
  • Cited by: 0
History
  • Received:March 11,2015
  • Revised:April 15,2015
  • Online: December 03,2015
Article QR Code
You are the first991258Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063