Abstract:Aimed at problem that analyzing and extracting knowledge form massive data is high computation cost, a Hadoop based knowledge extraction framework is proposed. We designe a knowledge exraction framework which combines with the parallel processing and distributed storage feature, and the framework is compatible different prototype reduction methods. Based on the MapReduce programming method the prototype reduction method is parallelly processed, and a prototype reduction combination rule with high classification accuracy and computational speed is designed. Finally, experiments results based on real UCI big data sets show that the proposed framework improves two orders of magnitude of the classification time of the nearest neighbor classifier.