Prefix-Based XML Frequent Path Mining Algorithm
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference
  • |
  • Related [20]
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    XML documents are semi-structured data, and XML frequent path mining can be divided into two steps: XML document serialization and sequence mining. The existing serialization method expresses the XML document as a set of Xpath paths with a plenty of node redundancy. Algorithms based on Apriori require multiple scanning of the database and can generate a large number of candidate sets. The PrefixSpan algorithm generates a large number of projection databases, occupying a lot of memory space. In view of the shortcomings of the existing algorithms used in XML frequent path mining, this paper proposes an efficient mining algorithm called Prefix-based XML Frequent Path Mining Algorithm (PXFP). The PXFP algorithm traverses the XML document tree in a breadth-first manner and represents each node as “node: parent node”, which reduces the node redundancy. The PXFP does not generate the projection database, but only gets the sub-node of the prefix, and then increases the length of the frequent pattern by the position information of the frequent sub-path, which reduces scanning the database. The experimental results show that the PXFP algorithm achieves higher time and space efficiency than the PrefixSpan algorithm.

    Reference
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

张洁,毛国君.基于序列前缀技术的XML频繁路径挖掘算法.计算机系统应用,2018,27(1):78-85

Copy
Share
Article Metrics
  • Abstract:1938
  • PDF: 2353
  • HTML: 1381
  • Cited by: 0
History
  • Received:April 09,2017
  • Revised:May 09,2017
  • Online: December 22,2017
Article QR Code
You are the first990787Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063