数据流环境下基于距离的离群点检测算法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(61702344)


Distance-based Outlier Detection Algorithm in Data Streams
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 增强出版
  • |
  • 文章评论
    摘要:

    面向滑动窗口的连续离群点检测问题是数据流管理领域中的重要问题. 该问题在信用卡欺诈检测、网络入侵防御, 地质灾害预警等诸多领域发挥着重要作用. 现有算法大多需要利用范围查询判断对象之间的位置关系, 而范围查询的查询代价大, 无法满足实时性要求. 本文提出基于滑动窗口模型下的查询处理框架GBEH (grid-based excepted heap). 首先, 它以网格为基础构建索引GQBI (grid queue based index)管理数据流. 该索引一方面维护数据流之间的位置关系, 另一方面利用队列维护数据流的时序关系. 其次, GBEH提出离群点检测算法PBH (priority based heap). 该算法利用查询范围与网格单元格的相交面积计算该单元格中包含于查询范围对象数目的数学期望, 并以此为基础构建基于小顶堆执行范围查询, 从而有效降低范围查询代价, 实现高效检测. 理论分析和实验验证GBEH的高效性和稳定性.

    Abstract:

    The detection of continuous outliers for sliding windows is an important problem in data stream management, which plays an important role in many fields such as credit card fraud detection, network intrusion prevention, and early warning for geological hazards. Most of the existing algorithms require the use of the range query to determine the positional relationship between objects, but the cost of the range query is usually high, which cannot meet real-time requirements. Therefore, this study proposes the grid-based excepted heap (GBEH), a query processing framework based on sliding windows. Specifically, GBEH proposes a grid queue based index (GQBI) on the basis of the grid to manage data streams, which maintains the positional relationship between data streams and the temporal relationship of data streams. Furthermore, GBEH proposes an outlier detection algorithm, namely, the priority based heap. This algorithm calculates the mathematical expectation of the number of objects in the cell that is included in the query range by use of the intersection area of the query range and the cell and on this basis, establishes an execution range query based on the min-heap. In this way, it effectively reduces the cost of range queries and achieves efficient detection. Theoretical analysis and experiments verify the efficiency and stability of GBEH.

    参考文献
    相似文献
    引证文献
引用本文

祝一帆,安云哲,夏秀峰.数据流环境下基于距离的离群点检测算法.计算机系统应用,2023,32(1):214-223

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-04-28
  • 最后修改日期:2022-06-01
  • 录用日期:
  • 在线发布日期: 2022-11-18
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号