基于共享近邻密度峰值聚类的过采样方法

doi:10.15888/j.cnki.csa.009607

微信公众号

网站二维码

首页 > 过刊浏览>2024年第33卷第10期 >245-254. DOI:10.15888/j.cnki.csa.009607

PDF HTML阅读 XML下载导出引用引用提醒

基于共享近邻密度峰值聚类的过采样方法
DOI:
                        10.15888/j.cnki.csa.009607
                    
CSTR:
                        32024.14.csa.009607
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(11801436)

Oversampling Method Based on Shared Nearest Neighbors for Density Peak Clustering

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

不平衡数据集中存在噪声和类重叠问题时, 传统分类器性能较低, 导致少数类样本难以被准确分类. 为了提高分类性能, 提出一种基于共享近邻密度峰值聚类和集成过滤机制的不平衡数据处理方法. 该方法首先利用共享近邻密度峰值聚类算法将少数类样本自适应地分为多个簇, 然后根据子簇内密度和大小分配过采样权重; 在子簇内合成时考虑使用样本的局部稀疏度和多类聚集度选择近邻样本以及确定线性插值的权重范围, 避免新样本生成于多数类聚集区域; 最后, 引入集成过滤机制剔除噪声和难以学习的边界样本以规范决策边界和提高生成样本的质量. 与5种过采样方法相比, 本文算法在8个公开数据集上整体表现更优.

Abstract:

In imbalanced datasets, the presence of noise and class overlapping often leads to poor performance of traditional classifiers, resulting in minority class samples being difficult to classify accurately. To improve classification performance, a method for handling imbalanced data based on shared nearest neighbor density peak clustering and ensemble filtering mechanism is proposed. This method first uses the shared nearest neighbor density peak clustering algorithm to adaptively divide the minority class samples into multiple clusters. Then, based on the density and size within the clusters, oversampling weights are allocated to each cluster. During the synthesis within clusters, the local sparsity and clustering coefficient of the samples are considered to select neighboring samples and determine the weight range of linear interpolation, thus avoiding the generation of new samples in the majority class aggregation area. Finally, an ensemble filtering mechanism is introduced to eliminate noise and hard-to-learn boundary samples to regulate the decision boundary and improve the quality of generated samples. Compared with 5 oversampling methods, this algorithm performs better overall on 8 public datasets.

参考文献

相似文献

引证文献

引用本文

李红玲,王彪.基于共享近邻密度峰值聚类的过采样方法.计算机系统应用,2024,33(10):245-254

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-03-13
最后修改日期:2024-04-10
录用日期:
在线发布日期: 2024-08-21
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码