本文已被:浏览 651次 下载 1391次
Received:December 24, 2020 Revised:January 25, 2021
Received:December 24, 2020 Revised:January 25, 2021
中文摘要: 不平衡数据集的应用领域日益广泛, 需求也越来越高, 为提升整体数据集的分类准确率, 以谱聚类欠取样为前提条件, 构建一种自编码网络不平衡数据挖掘方法. 把聚类问题转换成无向图多路径划分问题, 通过无向图与标准化处理完成谱聚类, 经过有选择地欠取样处理多数类数据集, 获取分类边界偏移量, 利用学习过程是无监督学习的自编码网络, 升、降维数据, 获取各维度隐藏特征, 实现各层面的数据高效表示学习, 根据最大均值差异与预设阈值的对比结果, 调整自编码网络, 基于得到的分类界面, 完成不平衡数据挖掘. 选用具有不同实际应用背景的UCI数据集, 从中抽取10组数据作为测试集, 经谱聚类欠取样处理与模拟实验, 发现所提方法大幅提升少数类分类精度与整体挖掘性能, 具有较好的适用性与可行性.
Abstract:The application fields of unbalanced data sets are becoming increasingly extensive, and the demand for them is getting higher. Taking the spectral clustering undersampling as a prerequisite, this study develops an unbalanced data mining method based on a self-encoding network to improve the classification accuracy of the overall data set. The clustering problem is converted into the multi-path partition problem of an undirected graph, and the spectral clustering is completed depending on the undirected graph and standardized processing. The majority of data sets are processed through selective undersampling to yield the classification boundary offset. The learning process is a self-encoding network of unsupervised learning, based on which the dimensionality of data is increased or reduced so that hidden features of each dimension can be obtained and the efficient representation and learning of data are realized at all levels. The self-encoding network is adjusted according to the comparison between the maximum mean difference and the preset threshold. The unbalanced data mining is then completed with the obtained classification interface. UCI data sets with different practical application backgrounds are selected, from which 10 sets of data are extracted as test sets. After spectral clustering undersampling, the simulation experiments demonstrate that the proposed method greatly improves the classification accuracy of the minority and overall mining performance, which shows good applicability and feasibility.
keywords: spectral clustering undersampling self-encoding network unbalanced data classification boundary clustering center
文章编号: 中图分类号: 文献标志码:
基金项目:
引用文本:
王舒梵,严涛,姜新盈.谱聚类欠取样下自编码网络不平衡数据挖掘.计算机系统应用,2021,30(10):331-335
WANG Shu-Fan,YAN Tao,JIANG Xin-Ying.Unbalanced Data Mining of Self-Encoding Network under Spectral Clustering Undersampling.COMPUTER SYSTEMS APPLICATIONS,2021,30(10):331-335
王舒梵,严涛,姜新盈.谱聚类欠取样下自编码网络不平衡数据挖掘.计算机系统应用,2021,30(10):331-335
WANG Shu-Fan,YAN Tao,JIANG Xin-Ying.Unbalanced Data Mining of Self-Encoding Network under Spectral Clustering Undersampling.COMPUTER SYSTEMS APPLICATIONS,2021,30(10):331-335