Abstract:Rankings as a popular production in modern society has gone deeply into everyone's life. For the rankings on massive data, it costs large consumption of hardware resources and time though running under the distributed environment, even may not be produced sometimes. This paper improves the Bayesian algorithm and proposes a rankings filtering algorithm of massive data based on hadoop. We first fill the missing data by entropy theory for getting the complete data. Then, we compute the probability in the sales volume on the very day by the improved Bayesian algorithm. If the probability is smaller than threshold, the goods would be filtered not to attend the ranking computation. Simulation on four million sales from Taobao shows the effectiveness and excellent property of the proposed algorithm.