Rankings Filtering Algorithm of Massive Data Based on Hadoop and its Application
DOI:
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Rankings as a popular production in modern society has gone deeply into everyone's life. For the rankings on massive data, it costs large consumption of hardware resources and time though running under the distributed environment, even may not be produced sometimes. This paper improves the Bayesian algorithm and proposes a rankings filtering algorithm of massive data based on hadoop. We first fill the missing data by entropy theory for getting the complete data. Then, we compute the probability in the sales volume on the very day by the improved Bayesian algorithm. If the probability is smaller than threshold, the goods would be filtered not to attend the ranking computation. Simulation on four million sales from Taobao shows the effectiveness and excellent property of the proposed algorithm.

    Reference
    Related
    Cited by
Get Citation

黄德才,陈欢. Hadoop 平台下海量数据排行榜过滤算法.计算机系统应用,2012,21(3):111-115,124

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 06,2011
  • Revised:August 24,2011
  • Adopted:
  • Online:
  • Published:
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063