###

计算机系统应用英文版:2018,27(2):151-156

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

基于MapReduce与距离的离群数据并行挖掘算法

任燕

(山西省特殊教育中等专业学校, 太原 030012)

Parallel Mining of Distance-Based Outliers Using MapReduce

REN Yan

(Shanxi Special Education Secondary School, Taiyuan 030012, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 1360次下载 2026次
Received:January 31, 2016

中文摘要: 数据挖掘技术是解决数据丰富而知识贫乏的有效途径，离群数据挖掘是数据挖掘领域中的重要研究内容之一，已广泛应用于网络入侵检测，信用卡诈骗，垃圾邮件的分析和基因突变分析等领域. 在高维海量数据中，由于数据量大和维度高，严重影响了离群数据挖掘的精度和效率. 本文在KNN基础上，通过定义“解集”的概念，在MapReduce编程环境下，实现了一种基于距离的离群数据挖掘算法. 分别采用人工数据集和UCI数据集，实验验证了该算法在不同条件下，参数对算法性能的影响.

中文关键词: MapReduce 基于距离 KNN 离群数据挖掘

Abstract:Data mining technology is an effective approach to resolve the problem of abundant data and scanty information. Outlier mining is one of the main research topic in the field of data mining, and it has been widely used in network intrusion detection, line card fraud, spam analysis, gene mutation analysis, etc. In high-dimensional data, the data volume and high dimension affect the effects of outlier data mining and efficiency seriously. In view of the high dimensional data, this study adopts the KNN implementing a distance-based outlier data mining algorithms under the MapReduce programming model by defining the “solving set”. Using artificial data set and UCI data set, the influence of parameters on the algorithm performance is discussed under different conditions in the experiment.

keywords: MapReduce distance-based KNN outliers data mining

文章编号： 中图分类号： 文献标志码：

基金项目:

Author Name	Affiliation
REN Yan	Shanxi Special Education Secondary School, Taiyuan 030012, China

Author Name	Affiliation
REN Yan	Shanxi Special Education Secondary School, Taiyuan 030012, China

引用文本：
任燕.基于MapReduce与距离的离群数据并行挖掘算法.计算机系统应用,2018,27(2):151-156
REN Yan.Parallel Mining of Distance-Based Outliers Using MapReduce.COMPUTER SYSTEMS APPLICATIONS,2018,27(2):151-156