###
计算机系统应用英文版:2018,27(2):151-156
←前一篇   |   后一篇→
本文二维码信息
码上扫一扫!
基于MapReduce与距离的离群数据并行挖掘算法
(山西省特殊教育中等专业学校, 太原 030012)
Parallel Mining of Distance-Based Outliers Using MapReduce
(Shanxi Special Education Secondary School, Taiyuan 030012, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1171次   下载 1542
Received:January 31, 2016    
中文摘要: 数据挖掘技术是解决数据丰富而知识贫乏的有效途径,离群数据挖掘是数据挖掘领域中的重要研究内容之一,已广泛应用于网络入侵检测,信用卡诈骗,垃圾邮件的分析和基因突变分析等领域. 在高维海量数据中,由于数据量大和维度高,严重影响了离群数据挖掘的精度和效率. 本文在KNN基础上,通过定义“解集”的概念,在MapReduce编程环境下,实现了一种基于距离的离群数据挖掘算法. 分别采用人工数据集和UCI数据集,实验验证了该算法在不同条件下,参数对算法性能的影响.
中文关键词: MapReduce  基于距离  KNN  离群数据挖掘
Abstract:Data mining technology is an effective approach to resolve the problem of abundant data and scanty information. Outlier mining is one of the main research topic in the field of data mining, and it has been widely used in network intrusion detection, line card fraud, spam analysis, gene mutation analysis, etc. In high-dimensional data, the data volume and high dimension affect the effects of outlier data mining and efficiency seriously. In view of the high dimensional data, this study adopts the KNN implementing a distance-based outlier data mining algorithms under the MapReduce programming model by defining the “solving set”. Using artificial data set and UCI data set, the influence of parameters on the algorithm performance is discussed under different conditions in the experiment.
文章编号:     中图分类号:    文献标志码:
基金项目:
引用文本:
任燕.基于MapReduce与距离的离群数据并行挖掘算法.计算机系统应用,2018,27(2):151-156
REN Yan.Parallel Mining of Distance-Based Outliers Using MapReduce.COMPUTER SYSTEMS APPLICATIONS,2018,27(2):151-156