Oversampling Method for Software Defect Prediction
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    To alleviate the class imbalance problem of software defect prediction and avoid the influence of overfitting on the accuracy of the defect prediction model, this study proposes an oversampling method for software defect prediction based on heterogeneous distance ranking (HDR). First, a minority of instances are distinguished by three classes to remove noise instances and reduce overfitting caused by noise data. Then, instances are ranked based on heterogeneous distances and paired with highly similar ones to generate new instances for the improvement of new instance diversity. Valuable minority instances that were deleted are restored afterward. The experiment compares the HDR algorithm with the SMOTE and the Borderline-SMOTE algorithms, and the RF classifier is used on the eight actual project data sets of NASA. The results show that there are 7.7% and 10.6% performance improvements on the F1-measure and G-Mean indicators respectively. Experimental results show that the HDR algorithm is significantly better than other algorithms in processing software defect prediction data sets with large data volumes and high imbalance rates.

    Reference
    Related
    Cited by
Get Citation

纪兴哲,邵培南.面向软件缺陷预测的过采样方法.计算机系统应用,2022,31(1):242-248

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:April 01,2021
  • Revised:April 29,2021
  • Adopted:
  • Online: December 17,2021
  • Published:
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063