Mining frequent patterns is a fundamental and essential problem in many data mining applications. Mining frequent closed itemsets provides complete and non-redundant results for frequent pattern analysis. The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number of columns. Such high-dimendional datasets pose a great challenge for existing closed frequent pattern discovery algorithms. This paper presents a survey of the various algorithms for mining frequent closed itemsets in very high dimensional data along with a hierarchy organizing the algorithms by their characteristics. We compare two row enumeration-based algorithms, discuss an algorithm which is designed to automatically switch between feature enumeration and row enumeration during the mining process based on the characteristics of the data subset being considered, and finally point out the research direction in this field.
1 Pasquier N, Bastide Y, Taouil R, Lakhal L. Discoveryingfrequent closed itemsets for association rules. In: Beeri C,Buneman P, eds. Proc. of the 7th International Conference onDatabase Theory, LNCS 1540. Heidelberg: Springer Berlin,1999: 398-416.
2 Pei J, Han J, Mao R. CLOSET: An eficient algorithm formining frequent closed itemsets. In: Chen W, Naughton JF,Bernstein PA, eds. Proc. 2000 ACM-SIGMOD InternationalWorkshop Data Mining and Knowledge Discovery. NewYork: ACM Press,2000:21-30.
3 Burdick D, Calimlim M, Gehrke J. MAFIA: A maximalfrequent itemset algorithm for transactional databases. In:Georgakopoulos D, Buchmann A, eds. Proc. of the 17thInternational Conference on Data Engineering. Heidelberg:IEEE Computer Society, 2001:443-452.
4 Zaki M, Hsiao C. Charm: An efficient algorithm for closedassociation rule mining. In: Grossman RL, Han J, Kumar V,Mannila H, Motwani R, eds. Proc. of 2002 SIAMInternational Conference Data Mining. Arlington, VA, 2002:457-473.
5 Wang J, Han J, Pei J. Closet+:Searching for the best stetegiesfor mining frequent closed itemsets. In: Getoor L, SenatorTE, Domingos P, Faloutsos C, eds. Proc. of 2003 ACMSIGKDD International Conference on Kowledge Discoveryand Data Mining. New York: ACM Press, 2003: 236-245.
6 Pan F, Cong G, Tung AK. Carpenter: Finding closed patternsin long biological datasets. In: Getoor L, Senator TE,Domingos P, Faloutsos C, eds. Proc. of 2003 ACM SIGKDDInternational Conference on Kowledge Discovery and DataMining. New York: ACM Press, 2003: 637-642.
7 Cong G, Tung AK, Xu X, et al. FARMER: Finding Interestingrule groups in microarray datasets. In: Weikum G, ed. Proc.of the ACM SIGMOD International Conference onManagement of Data 2004. New York: ACM Press, 2004:143-154.
8 Cong G, Tan K, Tung AK, et al. Mining top-k covering rulegroups for gene expression data. In: Ozcan F, ed. Proc. of theACM SIGMOD International Conference on Management ofData 2005. New York: ACM Press, 2005: 670-681.
9 Liu H, Han J, Xin D, Shao Z. Mining frequent. patterns fromvery high dimensional data: A. top-down row enumerationapproach. In: Ghosh J, Lambert D, Skillicorn DB, SrivastavaJ, eds. Proc. of the Sixth SIAM International Conference onData. Mining. Bethesda: SIAM, 2006: 20-22.
10 Pan F, Tung AK, Cao G, Xu X. COBBLER: Combiningcolumn and row enumeration for closed pattern discovery.In: Hatzopoulos M, Manolopoulos Y, eds. Proc. of 2004International Conference on Scientific and StatisticalDatabase Management. Washington: IEEE ComputerSociety, 2004: 21-30.