Abstract:Data mining technology is an effective approach to resolve the problem of abundant data and scanty information. Outlier mining is one of the main research topic in the field of data mining, and it has been widely used in network intrusion detection, line card fraud, spam analysis, gene mutation analysis, etc. In high-dimensional data, the data volume and high dimension affect the effects of outlier data mining and efficiency seriously. In view of the high dimensional data, this study adopts the KNN implementing a distance-based outlier data mining algorithms under the MapReduce programming model by defining the “solving set”. Using artificial data set and UCI data set, the influence of parameters on the algorithm performance is discussed under different conditions in the experiment.