本文已被:浏览 1360次 下载 2026次
Received:January 31, 2016
Received:January 31, 2016
中文摘要: 数据挖掘技术是解决数据丰富而知识贫乏的有效途径,离群数据挖掘是数据挖掘领域中的重要研究内容之一,已广泛应用于网络入侵检测,信用卡诈骗,垃圾邮件的分析和基因突变分析等领域. 在高维海量数据中,由于数据量大和维度高,严重影响了离群数据挖掘的精度和效率. 本文在KNN基础上,通过定义“解集”的概念,在MapReduce编程环境下,实现了一种基于距离的离群数据挖掘算法. 分别采用人工数据集和UCI数据集,实验验证了该算法在不同条件下,参数对算法性能的影响.
Abstract:Data mining technology is an effective approach to resolve the problem of abundant data and scanty information. Outlier mining is one of the main research topic in the field of data mining, and it has been widely used in network intrusion detection, line card fraud, spam analysis, gene mutation analysis, etc. In high-dimensional data, the data volume and high dimension affect the effects of outlier data mining and efficiency seriously. In view of the high dimensional data, this study adopts the KNN implementing a distance-based outlier data mining algorithms under the MapReduce programming model by defining the “solving set”. Using artificial data set and UCI data set, the influence of parameters on the algorithm performance is discussed under different conditions in the experiment.
keywords: MapReduce distance-based KNN outliers data mining
文章编号: 中图分类号: 文献标志码:
基金项目:
Author Name | Affiliation |
REN Yan | Shanxi Special Education Secondary School, Taiyuan 030012, China |
Author Name | Affiliation |
REN Yan | Shanxi Special Education Secondary School, Taiyuan 030012, China |
引用文本:
任燕.基于MapReduce与距离的离群数据并行挖掘算法.计算机系统应用,2018,27(2):151-156
REN Yan.Parallel Mining of Distance-Based Outliers Using MapReduce.COMPUTER SYSTEMS APPLICATIONS,2018,27(2):151-156
任燕.基于MapReduce与距离的离群数据并行挖掘算法.计算机系统应用,2018,27(2):151-156
REN Yan.Parallel Mining of Distance-Based Outliers Using MapReduce.COMPUTER SYSTEMS APPLICATIONS,2018,27(2):151-156