本文已被:浏览 558次 下载 1346次
Received:April 28, 2022 Revised:June 01, 2022
Received:April 28, 2022 Revised:June 01, 2022
中文摘要: 面向滑动窗口的连续离群点检测问题是数据流管理领域中的重要问题. 该问题在信用卡欺诈检测、网络入侵防御, 地质灾害预警等诸多领域发挥着重要作用. 现有算法大多需要利用范围查询判断对象之间的位置关系, 而范围查询的查询代价大, 无法满足实时性要求. 本文提出基于滑动窗口模型下的查询处理框架GBEH (grid-based excepted heap). 首先, 它以网格为基础构建索引GQBI (grid queue based index)管理数据流. 该索引一方面维护数据流之间的位置关系, 另一方面利用队列维护数据流的时序关系. 其次, GBEH提出离群点检测算法PBH (priority based heap). 该算法利用查询范围与网格单元格的相交面积计算该单元格中包含于查询范围对象数目的数学期望, 并以此为基础构建基于小顶堆执行范围查询, 从而有效降低范围查询代价, 实现高效检测. 理论分析和实验验证GBEH的高效性和稳定性.
Abstract:The detection of continuous outliers for sliding windows is an important problem in data stream management, which plays an important role in many fields such as credit card fraud detection, network intrusion prevention, and early warning for geological hazards. Most of the existing algorithms require the use of the range query to determine the positional relationship between objects, but the cost of the range query is usually high, which cannot meet real-time requirements. Therefore, this study proposes the grid-based excepted heap (GBEH), a query processing framework based on sliding windows. Specifically, GBEH proposes a grid queue based index (GQBI) on the basis of the grid to manage data streams, which maintains the positional relationship between data streams and the temporal relationship of data streams. Furthermore, GBEH proposes an outlier detection algorithm, namely, the priority based heap. This algorithm calculates the mathematical expectation of the number of objects in the cell that is included in the query range by use of the intersection area of the query range and the cell and on this basis, establishes an execution range query based on the min-heap. In this way, it effectively reduces the cost of range queries and achieves efficient detection. Theoretical analysis and experiments verify the efficiency and stability of GBEH.
keywords: data stream outlier distance-based objects maintenance
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(61702344)
引用文本:
祝一帆,安云哲,夏秀峰.数据流环境下基于距离的离群点检测算法.计算机系统应用,2023,32(1):214-223
ZHU Yi-Fan,AN Yun-Zhe,XIA Xiu-Feng.Distance-based Outlier Detection Algorithm in Data Streams.COMPUTER SYSTEMS APPLICATIONS,2023,32(1):214-223
祝一帆,安云哲,夏秀峰.数据流环境下基于距离的离群点检测算法.计算机系统应用,2023,32(1):214-223
ZHU Yi-Fan,AN Yun-Zhe,XIA Xiu-Feng.Distance-based Outlier Detection Algorithm in Data Streams.COMPUTER SYSTEMS APPLICATIONS,2023,32(1):214-223