Design and Implementation of Distributed MST Clustering
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [17]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Clustering is one of the most important problems in data mining. Clustering algorithm can classify data without any knowledge about it, and find out the information that valuable. Recently, data mining is more and more widely used in the telecommunication area, but because of some problems, such as the size of the data, the type of the data and the complication of the computation, clustering is not used widely. This article gives a MST algorithm that suit for distribute computing. Combining with the method to represent the similarity that suitable for this algorithm, it designs a new clustering algorithm to solve the problem of sea size data analysis. Then, it shows how the algorithm is realized based on the distribute computing model called mapreduce.

    Reference
    1 Han JW, Kamber M.数据挖掘概念与技术.北京:机械工业出版社,2001.
    2 Zhang T, Ramakrishnan R, Livny M. BIRCH: An efficient data clustering method for very large databases. Jagadish HV, Mumick IS, eds. Proc. of the 1996 ACM SIGMOD Int’l Conf. on Management of Data. Montreal: ACM Press, 1996. 103?114.
    3 Guha S, Rastogi R, Shim K. CURE: An efficient clustering algorithm for large databases. In: Haas LM, Tiwary A, eds. Proc. of the ACM SIGMOD Int’l Conf. on Management of Data. Seattle: ACM Press, 1998.73?84.
    4 Guha S, Rastogi R, Shim K. ROCK: a robust clustering algorithm for categorical attributes. Proc. of the 15th Int’l Conf. on Data Eng., 1999.
    5 Karypis G, Han EH, Kumar V. CHAMELEON: A hierar chical clustering algorithm using dynamic modeling. Technical Report, #99-007, Department of Computer Science and Engineering, University of Minnesota, 1999.
    6 Ester M, Kriegel H, Sander J, Xu XW. A density-based algorithm for discovering clusters in large spatial databases with noise. Simoudis E, Han JW, Fayyad UM, eds. Proc. of the 2nd Int’l Conf. on Knowledge Discovery and Data Mining (KDD’96). Portland: AAAI Press, 1996. 226?231.
    7 Ankerst M, Breunig MM, Kriegel HP, Sander J. OPTICS: Ordering points to identify the clustering structure. Delis A, Faloutsos C, Ghandeharizadeh S, eds. Proc. ACM SIGMOD Int’l Conf. on Management of Data. Philadelphia: ACM Press, 1999. 49?60.
    8 Wang W, Yang J, Muntz RR. STING: A statistical information grid approach to spatial data mining. In: Jarke M, Carey MJ, Dittrich KR, Lochovsky FH, Loucopoulos P, Jeusfeld MA, eds. Proc. of the 23rd Int’l Conf. on Very Large Data Bases. Athens: Morgan Kaufmann, 1997. 186?195.
    9 Sheikholeslami G, Chatterjee S, Zhang AD. WaveCluster: A multi-resolution clustering approach for very large spatial databases. In: Gupta A, Shmueli O, Widom J, eds. Proc. of the 24th Int’l Conf. on Very Large Data Bases. New York: Morgan Kaufmann, 1998. 428?439.
    10 Rakesh A, Johanners G, Dimitrios G, Prabhakar R. Automatic subspace clustering of high dimensional data for data mining applications. In: Snodgrass RT, Winslett M, eds. Proc. of the 1994 ACM SIGMOD Int’l Conf. on Management of Data. Minneapolis: ACM Press, 1994. 94?105.
    11 Karypis G. Kumar V. hMETIS 1.5: A hypergraph partitioning package. Technical report, Department of Computer Science, University of Minnesota, 1998.
    12 Karypis G, Kumar V. METIS 4.0: Unstructured graph partitioning and sparse matrix ordering system. Technical report, Department of Computer Science, University of Minnesota, 1998.
    13 杨戈,廖建新,朱晓民,樊秀梅.流媒体分发系统关键技术综述.电子学报,2009,(1):137?141.
    14 Gallager RG, Humblet PA. Spira PM. A Distributed Algorithm for Minimum Weight Spanning Trees. ACM Trans. on Program. Lang. & Systems, 1983, 5: 66-77.
    15 Dean J, Ghemawat S. MapReduce: Simplied Data Processing on Large Clusters. Proceedings of the 6th Symp. Operating System Design and Implementation (OSDI04). UsenixAssoc, 2004. 137?150.
    16 Karypis G, Kumar V. Analysis of multilevel graph partitioning. Technical Report TR 95-037, Department of Computer Science, University of Minnesota, 1995.
    17 Jain AK. Dubes RC. Algorithms for Clustering Data. Prentice Hall, 1988.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

金欣,王晶,沈奇威.分布式最小生成树聚类的设计与实现.计算机系统应用,2011,20(7):69-75

Copy
Share
Article Metrics
  • Abstract:1995
  • PDF: 4503
  • HTML: 0
  • Cited by: 0
History
  • Received:November 03,2010
  • Revised:December 15,2010
Article QR Code
You are the first990601Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063