基于平均特征重要性和集成学习的异常检测
作者:
基金项目:

国家自然科学基金重点项目(U1804263, U21A20472); 国家留学基金青年骨干教师出国研修项目; 福建省自然科学基金(2021J01616, 2020J01130167, 2021J01625)


Anomaly Detection Based on Average Feature Importance and Ensemble Learning
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [28]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    异常检测系统在网络空间安全中起着至关重要的作用, 为网络安全提供有效的保障. 对于复杂的网络流量信息, 传统的单一的分类器往往无法同时具备较高检测精确度和较强的泛化能力. 此外, 基于全特征的异常检测模型往往会受到冗余特征的干扰, 影响检测的效率和精度. 针对这些问题, 本文提出了一种基于平均特征重要性的特征选择和集成学习的模型, 选取决策树(DT)、随机森林(RF)、额外树(ET)作为基分类器, 建立投票集成模型, 并基于基尼系数计算基分类器的平均特征重要性进行特征选择. 在多个数据集上的实验评估结果表明, 本文提出的集成模型优于经典集成学习模型及其他著名异常检测集成模型. 且提出的基于平均特征重要性的特征选择方法可以使集成模型准确率平均进一步提升约0.13%, 训练时间平均节省约30%.

    Abstract:

    Anomaly detection system plays a significant role in cyberspace security and provides effective protection for network security. Regarding complex network traffic information, the traditional single classifier is often unable to ensure high detection accuracy and strong generalization ability at the same time. In addition, the anomaly detection model based on full features is often disturbed by redundancy features, which affects the accuracy and efficiency of detection. To address these problems, this study proposes a feature selection and ensemble learning model based on average feature importance. The decision tree (DT), random forest (RF), and extra tree (ET) are selected as the base classifiers to establish a voting ensemble model, and the average feature importance of the base classifiers is calculated based on the Gini coefficient for feature selection. The experimental evaluation results on several datasets show that the proposed model is superior to the classical ensemble learning models and other well-known anomaly detection ensemble models. The proposed model can improve the accuracy of the ensemble model by about 0.13% and save about 30% of training time on average.

    参考文献
    [1] Moustafa N, Hu JK, Slay J. A holistic review of network anomaly detection systems: A comprehensive survey. Journal of Network and Computer Applications, 2019, 128: 33–55. [doi: 10.1016/j.jnca.2018.12.006
    [2] Al S, Dener M. STL-HDL: A new hybrid network intrusion detection system for imbalanced dataset on big data environment. Computers & Security, 2021, 110: 102435
    [3] Harush S, Meidan Y, Shabtai A. DeepStream: Autoencoder-based stream temporal clustering and anomaly detection. Computers & Security, 2021, 106: 102276
    [4] Hiranai K, Kuramoto A, Seo A. Detection of anomalies in working posture during obstacle avoidance tasks using one-class support vector machine. Journal of Japan Industrial Management Association, 2021, 72(2E): 125–133
    [5] Subbiah S, Anbananthen KSM, Thangaraj S, et al. Intrusion detection technique in wireless sensor network using grid search random forest with Boruta feature selection algorithm. Journal of Communications and Networks, 2022, 24(2): 264–273. [doi: 10.23919/JCN.2022.000002
    [6] Nancy P, Muthurajkumar S, Ganapathy S, et al. Intrusion detection using dynamic feature selection and fuzzy temporal decision tree classification for wireless sensor networks. IET Communications, 2020, 14(5): 888–895. [doi: 10.1049/iet-com.2019.0172
    [7] Liu GY, Zhao HQ, Fan F, et al. An enhanced intrusion detection model based on improved KNN in WSNs. Sensors, 2022, 22(4): 1407. [doi: 10.3390/s22041407
    [8] Kan X, Fan YX, Fang ZJ, et al. A novel IoT network intrusion detection approach based on adaptive particle swarm optimization convolutional neural network. Information Sciences, 2021, 568: 147–162. [doi: 10.1016/j.ins.2021.03.060
    [9] Lo WW, Layeghy S, Sarhan M, et al. E-GraphSAGE: A graph neural network based intrusion detection system for IoT. Proceedings of the NOMS 2022–2022 IEEE/IFIP Network Operations and Management Symposium. Budapest: IEEE, 2022. 1–9.
    [10] 徐晓芳, 管瑞. 基于神经网络集成学习算法的金融时间序列预测. 计算机系统应用, 2022, 31(6): 29–37. [doi: 10.15888/j.cnki.csa.008551
    [11] Kumar G, Thakur K, Ayyagari MR. MLEsIDSs: Machine learning-based ensembles for intrusion detection systems—A review. The Journal of Supercomputing, 2020, 76(11): 8938–8971. [doi: 10.1007/s11227-020-03196-z
    [12] Zimba A, Chen HS, Wang ZS, et al. Modeling and detection of the multi-stages of advanced persistent threats attacks based on semi-supervised learning and complex networks characteristics. Future Generation Computer Systems, 2020, 106: 501–517. [doi: 10.1016/j.future.2020.01.032
    [13] Di Mauro M, Galatro G, Fortino G, et al. Supervised feature selection techniques in network intrusion detection: A critical review. Engineering Applications of Artificial Intelligence, 2021, 101: 104216. [doi: 10.1016/j.engappai.2021.104216
    [14] Al-Yaseen WL, Idrees AK, Almasoudy FH. Wrapper feature selection method based differential evolution and extreme learning machine for intrusion detection system. Pattern Recognition, 2022, 132: 108912. [doi: 10.1016/j.patcog.2022.108912
    [15] Yang L, Moubayed A, Shami A. MTH-IDS: A multitiered hybrid intrusion detection system for Internet of vehicles. IEEE Internet of Things Journal, 2022, 9(1): 616–632. [doi: 10.1109/JIOT.2021.3084796
    [16] 刘新倩, 单纯, 任家东, 等. 基于流量异常分析多维优化的入侵检测方法. 信息安全学报, 2019, 4(1): 14–26
    [17] Li YM, Xu YY, Liu Z, et al. Robust detection for network intrusion of industrial IoT based on multi-CNN fusion. Measurement, 2020, 154: 107450. [doi: 10.1016/j.measurement.2019.107450
    [18] Demir N, Dalkiliç G. Modified stacking ensemble approach to detect network intrusion. Turkish Journal of Electrical Engineering and Computer Sciences, 2018, 26(1): 418–433
    [19] Yang L, Manias DM, Shami A. PWPAE: An ensemble framework for concept drift adaptation in IoT data streams. Proceedings of 2021 IEEE Global Communications Conference (GLOBECOM). Madrid: IEEE, 2021. 1–6.
    [20] Dutta V, Choraś M, Pawlicki M, et al. A deep learning ensemble for network anomaly and cyber-attack detection. Sensors, 2020, 20(16): 4583. [doi: 10.3390/s20164583
    [21] Olasehinde OO, Johnson OV, Olayemi OC. Evaluation of selected meta learning algorithms for the prediction improvement of network intrusion detection system. Proceedings of the 2020 International Conference in Mathematics, Computer Engineering and Computer Science (ICMCECS). Ayobo: IEEE, 2020. 1–7.
    [22] Yang L, Moubayed A, Hamieh I, et al. Tree-based intelligent intrusion detection system in Internet of vehicles. Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM). Waikoloa: IEEE, 2019. 1–6.
    [23] Zhang H, Li JL, Liu XM, et al. Multi-dimensional feature fusion and stacking ensemble mechanism for network intrusion detection. Future Generation Computer Systems, 2021, 122: 130–143. [doi: 10.1016/j.future.2021.03.024
    [24] Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002, 16: 321–357. [doi: 10.1613/jair.953
    [25] Moustafa N, Slay J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). Proceedings of the 2015 Military Communications and Information Systems Conference. Canberra: IEEE, 2015. 1–6.
    [26] Sharafaldin I, Lashkari AH, Ghorbani AA. Toward generating a new intrusion detection dataset and intrusion traffic characterization. Proceedings of the 4th International Conference on Information Systems Security and Privacy. 2018, 1, 108–116.
    [27] Sharafaldin I, Lashkari AH, Hakak S, et al. Developing realistic distributed denial of service (DDoS) attack dataset and taxonomy. Proceedings of the 2019 International Carnahan Conference on Security Technology. Chennai: IEEE, 2019. 1–8.
    [28] Abdulhammed R, Faezipour M, Musafer H, et al. Efficient network intrusion detection using PCA-based dimensionality reduction of features. Proceedings of the 2019 International Symposium on Networks, Computers and Communications (ISNCC). Istanbul: IEEE, 2019. 1–6.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

庄锐,张浩.基于平均特征重要性和集成学习的异常检测.计算机系统应用,2023,32(6):60-69

复制
分享
文章指标
  • 点击次数:736
  • 下载次数: 2318
  • HTML阅读次数: 1785
  • 引用次数: 0
历史
  • 收稿日期:2022-11-10
  • 最后修改日期:2022-12-10
  • 在线发布日期: 2023-04-14
文章二维码
您是第11208049位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号