基于MapReduce与距离的离群数据并行挖掘算法

doi:10.15888/j.cnki.csa.005435

微信公众号

网站二维码

首页 > 过刊浏览>2018年第27卷第2期 >151-156. DOI:10.15888/j.cnki.csa.005435

PDF HTML阅读 XML下载导出引用引用提醒

基于MapReduce与距离的离群数据并行挖掘算法
DOI:
                        10.15888/j.cnki.csa.005435
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

Parallel Mining of Distance-Based Outliers Using MapReduce

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

数据挖掘技术是解决数据丰富而知识贫乏的有效途径，离群数据挖掘是数据挖掘领域中的重要研究内容之一，已广泛应用于网络入侵检测，信用卡诈骗，垃圾邮件的分析和基因突变分析等领域. 在高维海量数据中，由于数据量大和维度高，严重影响了离群数据挖掘的精度和效率. 本文在KNN基础上，通过定义“解集”的概念，在MapReduce编程环境下，实现了一种基于距离的离群数据挖掘算法. 分别采用人工数据集和UCI数据集，实验验证了该算法在不同条件下，参数对算法性能的影响.

Abstract:

Data mining technology is an effective approach to resolve the problem of abundant data and scanty information. Outlier mining is one of the main research topic in the field of data mining, and it has been widely used in network intrusion detection, line card fraud, spam analysis, gene mutation analysis, etc. In high-dimensional data, the data volume and high dimension affect the effects of outlier data mining and efficiency seriously. In view of the high dimensional data, this study adopts the KNN implementing a distance-based outlier data mining algorithms under the MapReduce programming model by defining the “solving set”. Using artificial data set and UCI data set, the influence of parameters on the algorithm performance is discussed under different conditions in the experiment.

参考文献

相似文献

引证文献

引用本文

任燕.基于MapReduce与距离的离群数据并行挖掘算法.计算机系统应用,2018,27(2):151-156

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2016-01-31
最后修改日期:
录用日期:
在线发布日期: 2018-02-05
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码