基于共享近邻密度峰值聚类的过采样方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(11801436)


Oversampling Method Based on Shared Nearest Neighbors for Density Peak Clustering
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    不平衡数据集中存在噪声和类重叠问题时, 传统分类器性能较低, 导致少数类样本难以被准确分类. 为了提高分类性能, 提出一种基于共享近邻密度峰值聚类和集成过滤机制的不平衡数据处理方法. 该方法首先利用共享近邻密度峰值聚类算法将少数类样本自适应地分为多个簇, 然后根据子簇内密度和大小分配过采样权重; 在子簇内合成时考虑使用样本的局部稀疏度和多类聚集度选择近邻样本以及确定线性插值的权重范围, 避免新样本生成于多数类聚集区域; 最后, 引入集成过滤机制剔除噪声和难以学习的边界样本以规范决策边界和提高生成样本的质量. 与5种过采样方法相比, 本文算法在8个公开数据集上整体表现更优.

    Abstract:

    In imbalanced datasets, the presence of noise and class overlapping often leads to poor performance of traditional classifiers, resulting in minority class samples being difficult to classify accurately. To improve classification performance, a method for handling imbalanced data based on shared nearest neighbor density peak clustering and ensemble filtering mechanism is proposed. This method first uses the shared nearest neighbor density peak clustering algorithm to adaptively divide the minority class samples into multiple clusters. Then, based on the density and size within the clusters, oversampling weights are allocated to each cluster. During the synthesis within clusters, the local sparsity and clustering coefficient of the samples are considered to select neighboring samples and determine the weight range of linear interpolation, thus avoiding the generation of new samples in the majority class aggregation area. Finally, an ensemble filtering mechanism is introduced to eliminate noise and hard-to-learn boundary samples to regulate the decision boundary and improve the quality of generated samples. Compared with 5 oversampling methods, this algorithm performs better overall on 8 public datasets.

    参考文献
    相似文献
    引证文献
引用本文

李红玲,王彪.基于共享近邻密度峰值聚类的过采样方法.计算机系统应用,2024,33(10):245-254

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-03-13
  • 最后修改日期:2024-04-10
  • 录用日期:
  • 在线发布日期: 2024-08-21
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号