Prefix-Based XML Frequent Path Mining Algorithm

doi:10.15888/j.cnki.csa.006166

AIPUB归智期刊联盟

WeChat

Mobile website

2025-4-8- 2

Home > Archive>Volume 27, Issue 1, 2018 >78-85. DOI:10.15888/j.cnki.csa.006166

PDF HTML XML Export Cite reminder

Prefix-Based XML Frequent Path Mining Algorithm
DOI:
                        10.15888/j.cnki.csa.006166
                    
CSTR:
                        [cstr]
                    
Author:
                        ZHANG JieZHANG Jie
School of Information, Central University of Finance and Economics, Beijing 100081, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
MAO Guo-JunMAO Guo-Jun
School of Information, Central University of Finance and Economics, Beijing 100081, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Related [20]

Cited by

Materials

Comments

Abstract:

XML documents are semi-structured data, and XML frequent path mining can be divided into two steps: XML document serialization and sequence mining. The existing serialization method expresses the XML document as a set of Xpath paths with a plenty of node redundancy. Algorithms based on Apriori require multiple scanning of the database and can generate a large number of candidate sets. The PrefixSpan algorithm generates a large number of projection databases, occupying a lot of memory space. In view of the shortcomings of the existing algorithms used in XML frequent path mining, this paper proposes an efficient mining algorithm called Prefix-based XML Frequent Path Mining Algorithm (PXFP). The PXFP algorithm traverses the XML document tree in a breadth-first manner and represents each node as “node: parent node”, which reduces the node redundancy. The PXFP does not generate the projection database, but only gets the sub-node of the prefix, and then increases the length of the frequent pattern by the position information of the frequent sub-path, which reduces scanning the database. The experimental results show that the PXFP algorithm achieves higher time and space efficiency than the PrefixSpan algorithm.

Key words:XML frequent path mining;serialization;location information;prefix

Get Citation

张洁,毛国君.基于序列前缀技术的XML频繁路径挖掘算法.计算机系统应用,2018,27(1):78-85

Copy

Article Metrics

Abstract:1938
PDF: 2353
HTML: 1381
Cited by: 0

History

Received:April 09,2017
Revised:May 09,2017
Adopted:
Online: December 22,2017
Published:

Article QR Code

You are the first990787Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063