基于二阶近邻的异常检测

doi:10.15888/j.cnki.csa.008968

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年7月27日 22:05 星期日

首页 > 过刊浏览>2023年第32卷第2期 >160-169. DOI:10.15888/j.cnki.csa.008968

PDF HTML阅读 XML下载导出引用引用提醒

基于二阶近邻的异常检测
DOI:
                        10.15888/j.cnki.csa.008968
                    
CSTR:
                        
                    
作者:
                        卢梦茹卢梦茹
浙江师范大学 数学与计算机科学学院, 金华 321004
在期刊界中查找
在百度中查找
在本站中查找
周昌军周昌军
浙江师范大学 数学与计算机科学学院, 金华 321004
在期刊界中查找
在百度中查找
在本站中查找
刘华文刘华文
浙江师范大学 数学与计算机科学学院, 金华 321004
在期刊界中查找
在百度中查找
在本站中查找
徐晓丹徐晓丹
浙江师范大学 数学与计算机科学学院, 金华 321004
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金 (61976195)

Anomaly Detection Based on Second-order Proximity

Author:

LU Meng-Ru
LU Meng-Ru
College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua 321004, China
在期刊界中查找
在百度中查找
在本站中查找
ZHOU Chang-Jun
ZHOU Chang-Jun
College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua 321004, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Hua-Wen
LIU Hua-Wen
College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua 321004, China
在期刊界中查找
在百度中查找
在本站中查找
XU Xiao-Dan
XU Xiao-Dan
College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua 321004, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [26]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

对盈千累万且错综复杂的数据集进行分析, 是一个非常具有挑战性的任务, 检测数据中的异常值的技术在该任务中发挥着举足轻重的作用. 通过聚类捕获异常的方式, 在日趋流行的异常检测技术中是最为常用的一类方法. 文中提出了一种基于二阶近邻的异常检测算法(anomaly detection based second-order proximity, SOPD), 主要包括聚类和异常检测两个阶段. 在聚类过程中, 通过二阶近邻的方式获取相似性矩阵; 在异常检测过程中, 根据簇中的点与簇中心的关系, 计算聚类生成的每一个簇中的所有的点与该簇中心的距离, 捕捉异常状态, 并把每个数据点的密度考虑进去, 排除簇边界情况. 二阶近邻的使用, 使得数据的局部性以及全局性得以被同时考虑, 进而使得聚类得到的簇数减少, 增加了异常检测的精确性. 通过大量实验, 将该算法与一些经典的异常检测算法进行比较, 结果表明, SOPD算法整体上性能较好.

关键词:异常检测;二阶近邻;相似性矩阵;密度;全局性;机器学习;数据挖掘

Abstract:

The analysis of numerous and intricate data sets is a highly challenging task, in which the technique to detect outliers in data plays a pivotal role. Capturing anomalies by clustering is the most common method among the increasingly popular anomaly detection techniques. This study proposes an anomaly detection algorithm based on second-order proximity (SOPD), which includes clustering and anomaly detection stages. During clustering, the similarity matrix is obtained by second-order proximity. During anomaly detection, the relationships between points in the cluster and the center of the cluster are employed to calculate the distance of all the points in each cluster generated by clustering from the center of the cluster and capture the anomalous state. The density of each data point is also taken into account to exclude the cases of cluster boundaries. The use of second-order proximity enables the locality and globality of the data to be considered simultaneously, which reduces the number of the obtained clusters and increases the accuracy of anomaly detection. Moreover, this study compares this algorithm with some classical anomaly detection algorithms through massive experiments, and the result shows that the SOPD-based algorithm performs well overall.

Key words:anomaly detection;second-order proximity;similarity matrix;density;globality;machine learning;data mining

参考文献

[1] Saxena S, Rajpoot DS. Density-based approach for outlier detection and removal. Proceedings of 2018 International Conference on Signal Processing and Communication. Singapore: Springer, 2019. 281–291.

[2] Liu HW, Li EH, Liu XW, et al. Anomaly detection with kernel preserving embedding. ACM Transactions on Knowledge Discovery from Data, 2021, 15(5): 91. [doi: 10.1145/3447684

[3] Liu TF, Gao H, Wu JJ. Review of outlier detection algorithms based on grain storage temperature data. Proceedings of 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA). Dalian: IEEE, 2020. 1045–1048.

[4] Wahid A, Rao ACS. An outlier detection algorithm based on KNN-kernel density estimation. Proceedings of 2020 International Joint Conference on Neural Networks (IJCNN). Glasgow: IEEE, 2020. 1–8.

[5] Siegel AF. Statistics and Data Analysis: An Introduction. New York: John Wiley & Sons, 1988.

[6] Laurikkala J, Juhola M, Kentala E. Informal identification of outliers in medical data. Proceedings of 5th International Workshop on Intelligent Data Analysis in Medicine and Pharmacology. Berlin: ECCAI, 2000. 20–24.

[7] Angiulli F, Fassetti F. DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets. ACM Transactions on Knowledge Discovery from Data, 2009, 3(1): 4. [doi: 10.1145/1497577.1497581

[8] Zhang K, Hutter M, Jin HD. A new local distance-based outlier detection approach for scattered real-world data. Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Bangkok: Springer, 2009. 813–822.

[9] Angiulli F, Basta S, Lodi S, et al. Reducing distance computations for distance-based outliers. Expert Systems with Applications, 2020, 147: 113215. [doi: 10.1016/j.eswa.2020.113215

[10] Breunig MM, Kriegel HP, Ng RT, et al. LOF: Identifying density-based local outliers. ACM SIGMOD Record, 2000, 29(2): 93–104. [doi: 10.1145/335191.335388

[11] Jin W, Tung AKH, Han JW, et al. Ranking outliers using symmetric neighborhood relationship. Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Singapore: Springer, 2006. 577–593.

[12] Riahi-Madvar M, Azirani AA, Nasersharif B, et al. A new density-based subspace selection method using mutual information for high dimensional outlier detection. Knowledge-based Systems, 2021, 216: 106733. [doi: 10.1016/j.knosys.2020.106733

[13] Pu G, Wang LJ, Shen J, et al. A hybrid unsupervised clustering-based anomaly detection method. Tsinghua Science and Technology, 2021, 26(2): 146–153. [doi: 10.26599/TST.2019.9010051

[14] Ester M, Kriegel HP, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland: AAAI Press, 1996. 226–231.

[15] Huang JL, Zhu QS, Yang LJ, et al. A novel outlier cluster detection algorithm without top-n parameter. Knowledge-Based Systems, 2017, 121: 32–40. [doi: 10.1016/j.knosys.2017.01.013

[16] Nozad N, Ahmad S, Haeri A, et al. SDCOR: Scalable density-based clustering for local outlier detection in massive-scale datasets. Knowledge-based Systems, 2021, 228: 107256. [doi: 10.1016/j.knosys.2021.107256

[17] Gao JH, Ji WX, Zhang LL, et al. Cube-based incremental outlier detection for streaming computing. Information Sciences, 2020, 517: 361–376. [doi: 10.1016/j.ins.2019.12.060

[18] Bansal M, Sharma D. A novel multi-view clustering approach via proximity-based factorization targeting structural maintenance and sparsity challenges for text and image categorization. Information Processing & Management, 2021, 58(4): 102546. [doi: 10.1016/j.ipm.2021.102546

[19] Frey BJ, Dueck D. Clustering by passing messages between data points. Science, 2007, 315(5814): 972–976. [doi: 10.1126/science.1136800

[20] Janssens JHM, Huszár F, Postma EO, et al. Stochastic outlier selection. Technical Report. Tilburg: Tilburg University, 2012.

[21] Shyu ML, Chen SC, Sarinnapakorn K, et al. A novel anomaly detection scheme based on principal component classifier. Proceedings of IEEE Foundations and New Directions of Data Mining Workshop, in Conjunction with the 3rd IEEE International Conference on Data Mining (ICDM). IEEE, 2003. 172–179.

[22] Arning A, Agrawal R, Raghavan P. A linear method for deviation detection in large databases. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland: AAAI Press, 1996. 164–169.

[23] Kriegel HP, Kr?ger P, Schubert E, et al. Outlier detection in axis-parallel subspaces of high dimensional data. Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Bangkok: Springer, 2009. 831–838.

[24] Sch?lkopf B, Platt JC, Shawe-Taylor J, et al. Estimating the support of a high-dimensional distribution. Neural Computation, 2001, 13(7): 1443–1471. [doi: 10.1162/089976601750264965

[25] Papadimitriou S, Kitagawa H, Gibbons PB, et al. LOCI: Fast outlier detection using the local correlation integral. Proceedings of the 19th International Conference on Data Engineering. Bangalore: IEEE, 2003. 315–326.

[26] Zhao Y, Nasrullah Z, Li Z. PyOD: A Python toolbox for scalable outlier detection. Journal of Machine Learning Research (JMLR), 2019, 20(96): 1–7

引用本文

卢梦茹,周昌军,刘华文,徐晓丹.基于二阶近邻的异常检测.计算机系统应用,2023,32(2):160-169

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-07-14
最后修改日期:2022-09-07
录用日期:
在线发布日期: 2022-11-29
出版日期:

微信公众号

网站二维码

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码