Design and Implementation of Distributed MST Clustering

AIPUB归智期刊联盟

WeChat

Mobile website

2025-4-5- 6

Home > Archive>Volume 20, Issue 7, 2011 >69-75

PDF HTML XML Export Cite reminder

Design and Implementation of Distributed MST Clustering
DOI:
                        
                    
CSTR:
                        [cstr]
                    
Author:
                        JIN XinJIN Xin
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China; EB Information Technology Co. Ltd., Beijing 100083, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
WANG JingWANG Jing
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China; EB Information Technology Co. Ltd., Beijing 100083, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
SHEN Qi-WeiSHEN Qi-Wei
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China; EB Information Technology Co. Ltd., Beijing 100083, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [17]

Related [20]

Cited by

Materials

Comments

Abstract:

Clustering is one of the most important problems in data mining. Clustering algorithm can classify data without any knowledge about it, and find out the information that valuable. Recently, data mining is more and more widely used in the telecommunication area, but because of some problems, such as the size of the data, the type of the data and the complication of the computation, clustering is not used widely. This article gives a MST algorithm that suit for distribute computing. Combining with the method to represent the similarity that suitable for this algorithm, it designs a new clustering algorithm to solve the problem of sea size data analysis. Then, it shows how the algorithm is realized based on the distribute computing model called mapreduce.

Key words:clustering; distribute; hadoop; mapreduce; data mining; MST

Reference

1 Han JW, Kamber M.数据挖掘概念与技术.北京:机械工业出版社,2001.

2 Zhang T, Ramakrishnan R, Livny M. BIRCH: An efficient data clustering method for very large databases. Jagadish HV, Mumick IS, eds. Proc. of the 1996 ACM SIGMOD Int’l Conf. on Management of Data. Montreal: ACM Press, 1996. 103?114.

3 Guha S, Rastogi R, Shim K. CURE: An efficient clustering algorithm for large databases. In: Haas LM, Tiwary A, eds. Proc. of the ACM SIGMOD Int’l Conf. on Management of Data. Seattle: ACM Press, 1998.73?84.

4 Guha S, Rastogi R, Shim K. ROCK: a robust clustering algorithm for categorical attributes. Proc. of the 15th Int’l Conf. on Data Eng., 1999.

5 Karypis G, Han EH, Kumar V. CHAMELEON: A hierar chical clustering algorithm using dynamic modeling. Technical Report, #99-007, Department of Computer Science and Engineering, University of Minnesota, 1999.

6 Ester M, Kriegel H, Sander J, Xu XW. A density-based algorithm for discovering clusters in large spatial databases with noise. Simoudis E, Han JW, Fayyad UM, eds. Proc. of the 2nd Int’l Conf. on Knowledge Discovery and Data Mining (KDD’96). Portland: AAAI Press, 1996. 226?231.

7 Ankerst M, Breunig MM, Kriegel HP, Sander J. OPTICS: Ordering points to identify the clustering structure. Delis A, Faloutsos C, Ghandeharizadeh S, eds. Proc. ACM SIGMOD Int’l Conf. on Management of Data. Philadelphia: ACM Press, 1999. 49?60.

8 Wang W, Yang J, Muntz RR. STING: A statistical information grid approach to spatial data mining. In: Jarke M, Carey MJ, Dittrich KR, Lochovsky FH, Loucopoulos P, Jeusfeld MA, eds. Proc. of the 23rd Int’l Conf. on Very Large Data Bases. Athens: Morgan Kaufmann, 1997. 186?195.

9 Sheikholeslami G, Chatterjee S, Zhang AD. WaveCluster: A multi-resolution clustering approach for very large spatial databases. In: Gupta A, Shmueli O, Widom J, eds. Proc. of the 24th Int’l Conf. on Very Large Data Bases. New York: Morgan Kaufmann, 1998. 428?439.

10 Rakesh A, Johanners G, Dimitrios G, Prabhakar R. Automatic subspace clustering of high dimensional data for data mining applications. In: Snodgrass RT, Winslett M, eds. Proc. of the 1994 ACM SIGMOD Int’l Conf. on Management of Data. Minneapolis: ACM Press, 1994. 94?105.

11 Karypis G. Kumar V. hMETIS 1.5: A hypergraph partitioning package. Technical report, Department of Computer Science, University of Minnesota, 1998.

12 Karypis G, Kumar V. METIS 4.0: Unstructured graph partitioning and sparse matrix ordering system. Technical report, Department of Computer Science, University of Minnesota, 1998.

13 杨戈,廖建新,朱晓民,樊秀梅.流媒体分发系统关键技术综述.电子学报,2009,(1):137?141.

14 Gallager RG, Humblet PA. Spira PM. A Distributed Algorithm for Minimum Weight Spanning Trees. ACM Trans. on Program. Lang. & Systems, 1983, 5: 66－77.

15 Dean J, Ghemawat S. MapReduce: Simplied Data Processing on Large Clusters. Proceedings of the 6th Symp. Operating System Design and Implementation (OSDI04). UsenixAssoc, 2004. 137?150.

16 Karypis G, Kumar V. Analysis of multilevel graph partitioning. Technical Report TR 95-037, Department of Computer Science, University of Minnesota, 1995.

17 Jain AK. Dubes RC. Algorithms for Clustering Data. Prentice Hall, 1988.

Get Citation

金欣,王晶,沈奇威.分布式最小生成树聚类的设计与实现.计算机系统应用,2011,20(7):69-75

Copy

Article Metrics

Abstract:1995
PDF: 4503
HTML: 0
Cited by: 0

History

Received:November 03,2010
Revised:December 15,2010
Adopted:
Online:
Published:

Article QR Code

You are the first990601Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063