低代价的数据流分类算法
作者:
基金项目:

福建省自然科学基金(2013J01216,2016J01280)


Low-Cost Algorithm for Stream Data Classification
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [10]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    现有数据流分类算法大多使用有监督学习,而标记高速数据流上的样本需要很大的代价,因此缺乏实用性.针对以上问题,提出了一种低代价的数据流分类算法2SDC.新算法利用少量已标记类别的样本和大量未标记样本来训练和更新分类模型,并且动态监测数据流上可能发生的概念漂移.真实数据流上的实验表明,2SDC算法不仅具有和当前有监督学习分类算法相当的分类精度,并且能够自适应数据流上的概念漂移.

    Abstract:

    Existing classification algorithms for data stream are mainly based on supervised learning,while manual labeling instances arriving continuously at a high speed requires much effort.A low-cost learning algorithm for stream data classification named 2SDC is proposed to solve the problem mentioned above.With few labeled instances and a large number of unlabeled instances,2SDC trains the classification model and then updates it.The proposed algorithm can also detect the potential concept drift of the data stream and adjust the classification model to the current concept.Experimental results show that the accuracy of 2SDC is comparable to that of state-of-the-art supervised algorithm.

    参考文献
    1 辛轶,郭躬德,陈黎飞,毕亚新.IKnnM-DHecoc:一种解决概念漂移问题的方法.计算机研究与发展,2011,48(4):592-601.
    2 Turner K, Ghosh J. Error collection and error reduction in ensemble classifiers. Connenction Science, 1996, 18(3):385-403.
    3 Aggarwal CC, Procopiuc C, Wolf JL, et al. Fast algorithm for projected clustering. Proc. of the ACM-SIGMOD. New York. ACM Press. 1999. 61-71.
    4 Masud MM, Woolam C, Gao J, et al. Facing the reality of data stream classification:coping with scarcity of labeled data. Knowledge and Information System, 2012, 33(1):213-244.
    5 Zhou D, Bosquet O, Lal T N. Learning with local and global consistency. Advances in Neural Information Processing Systems, 2003, 16(1):321-328.
    6 Keller JM, Hand D. The impact of changing populations on classifier performance. Proc. of the 5th International Conference on Knowledge Discovery and Data Mining. New York. ACM Press. 1999. 367-371.
    7 Street WN, Kim YS. A streaming ensemble algorithm (SEA) for large-scale classification. Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York. ACM Press. 2001. 377-382.
    8 Zhang P, Zhu X, Shi Y, et al. An aggregate ensemble for mining concept drifting data streams with noise. Proc. of the 13th Pacific-Asia Conference on Knowledge Discovery. Bangkok. 2009. 1021-1029.
    9 桂林,张玉红,胡学刚.一种基于混合集成方法的数据流概念漂移检测算法.计算机科学,2012,39(1):152-155.
    10 徐文华,覃征,常扬.基于半监督学习的数据流集成分类算法.模式识别与人工智能,2012,25(2):292-299.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

李南.低代价的数据流分类算法.计算机系统应用,2016,25(12):187-192

复制
分享
文章指标
  • 点击次数:2413
  • 下载次数: 2032
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2016-03-29
  • 最后修改日期:2016-06-01
  • 在线发布日期: 2016-12-14
文章二维码
您是第12831002位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号