Dynamic Tunneling Heuristic for Focused Crawling
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Topic island on Internet Web pages has seriously affected the performance of focused crawlers. The metric of setting more initial links to find new topics cannot guarantee the comprehensiveness of Web pages. On the basis of analyzing typical crawling strategies and taking into account the hierarchy of topic relevant, we propose a crawling strategy using dynamic tunneling. The crawling strategy uses the tunneling technology based on the topic of Web pages to discover new topics, and constructs a hierarchical topic model to solve the problem of weak link between two topic islands. Meanwhile, the strategy can effectively prevent topic drift caused by collecting too many topic-independent pages, thus dynamic controls the tunneling depth in the crawling direction with the semantic information of the topic maintained. Experimental results show that the proposed method can better address the topic island issue, thereby enhancing the recall of focused search engines.

    Reference
    Related
    Cited by
Get Citation

姜琨,朱磊,王一川.基于动态隧道技术的主题爬行策略.计算机系统应用,2020,29(3):253-260

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 19,2019
  • Revised:August 22,2019
  • Adopted:
  • Online: March 02,2020
  • Published: March 15,2020
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063