Cross-project Defect Prediction Based on Dynamic Distribution Alignment and Pseudo-label Learning
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [37]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Cross-project defect prediction (CPDP) has emerged as a crucial research area in software engineering and data mining. Using defective code from other data-rich projects to build prediction models solves the problem of insufficient data during model construction. However, the distribution difference between the code files of source and target projects results in poor cross-project prediction. Most studies adopt the domain adaptation methods to solve this problem, but the existing methods only focus on the influence of conditional or marginal distribution on domain adaptation, ignoring its dynamics. On the other hand, they fail to choose appropriate pseudo-labels. Based on the above two aspects, this study proposes a cross-project defect prediction method based on dynamic distribution alignment and pseudo-label learning (DPLD). Specifically, the proposed method reduces the marginal and conditional distribution differences between projects in the domain alignment and category alignment modules, respectively, by means of the adversarial domain adaptation method. Additionally, it dynamically and quantitatively characterizes the relative importance of the two distributions using dynamic distribution factors. Furthermore, this study proposes a pseudo-label learning method to enhance the accuracy of pseudo-labels as real labels through the geometric similarity between data. Experiments conducted on the PROMISE dataset show that DPLD achieves average improvements of 22.98% and 15.21% in terms of F-measure and AUC, respectively. These results demonstrate the effectiveness of the DPLD method in reducing distribution differences between projects and improving the performance of cross-project defect prediction.

    Reference
    [1] 纪兴哲, 邵培南. 面向软件缺陷预测的过采样方法. 计算机系统应用, 2022, 31(1): 242–248.
    [2] 陈凯, 邵培南. 基于深度学习的软件缺陷预测模型. 计算机系统应用, 2021, 30(1): 29–37.
    [3] Laradji IH, Alshayeb M, Ghouti L. Software defect prediction using ensemble learning on selected features. Information and Software Technology, 2015, 58: 388–402.
    [4] Azzeh M, Elsheikh Y, Nassif AB, et al. Examining the performance of kernel methods for software defect prediction based on support vector machine. Science of Computer Programming, 2023, 226: 102916.
    [5] Wang S, Liu TY, Tan L. Automatically learning semantic features for defect prediction. Proceedings of the 38th IEEE/ACM International Conference on Software Engineering. Austin: IEEE, 2016. 297–308.
    [6] Hosseini S, Turhan B, Gunarathna D. A systematic literature review and meta-analysis on cross project defect prediction. IEEE Transactions on Software Engineering, 2019, 45(2): 111–147.
    [7] Jin C. Cross-project software defect prediction based on domain adaptation learning and optimization. Expert Systems with Applications, 2021, 171: 114637.
    [8] 陈曙, 叶俊民, 刘童. 一种基于领域适配的跨项目软件缺陷预测方法. 软件学报, 2020, 31(2): 266–281.
    [9] 李伟湋, 陈翔, 张恒伟, 等. 一种基于同步语义对齐的异构缺陷预测方法. 软件学报, 2023, 34(6): 2669–2689.
    [10] Pan SJ, Tsang IW, Kwok JT, et al. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 2011, 22(2): 199–210.
    [11] Nam J, Pan SJ, Kim S. Transfer defect learning. Proceedings of the 35th International Conference on Software Engineering. San Francisco: IEEE, 2013. 382–391.
    [12] Satpal S, Sarawagi S. Domain adaptation of conditional probability models via feature subsetting. Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases. Warsaw: Springer, 2007. 224–235.
    [13] Zou QY, Lu L, Yang ZY, et al. Joint feature representation learning and progressive distribution matching for cross-project defect prediction. Information and Software Technology, 2021, 137: 106588.
    [14] Turhan B, Menzies T, Bener AB, et al. On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 2009, 14(5): 540–578.
    [15] He ZM, Shu FD, Yang Y, et al. An investigation on the feasibility of cross-project defect prediction. Automated Software Engineering, 2012, 19(2): 167–199.
    [16] Ma Y, Luo GC, Zeng X, et al. Transfer learning for cross-company software defect prediction. Information and Software Technology, 2012, 54(3): 248–256.
    [17] Chen L, Fang B, Shang ZW, et al. Negative samples reduction in cross-company software defects prediction. Information and Software Technology, 2015, 62: 67–77.
    [18] Ryu D, Choi O, Baik J. Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empirical Software Engineering, 2016, 21(1): 43–71.
    [19] Long MS, Wang JM, Ding GG, et al. Transfer feature learning with joint distribution adaptation. Proceedings of the 2013 IEEE International Conference on Computer Vision. Sydney: IEEE, 2013. 2200–2207.
    [20] Wu F, Jing XY, Sun Y, et al. Cross-project and within-project semisupervised software defect prediction: A unified approach. IEEE Transactions on Reliability, 2018, 67(2): 581–597.
    [21] Xu Z, Pang S, Zhang T, et al. Cross project defect prediction via balanced distribution adaptation based transfer learning. Journal of Computer Science and Technology, 2019, 34(5): 1039–1062.
    [22] Zhuang FZ, Cheng XH, Luo P, et al. Supervised representation learning: Transfer learning with deep autoencoders. Proceedings of the 24th International Joint Conference on Artificial Intelligence. Buenos Aires: IJCAI, 2015. 4119–4125.
    [23] Li JJ, Jing MM, Lu K, et al. Locality preserving joint transfer for domain adaptation. IEEE Transactions on Image Processing, 2019, 28(12): 6103–6115.
    [24] Chen SJ, Jia X, He JZ, et al. Semi-supervised domain adaptation based on dual-level domain mixing for semantic segmentation. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 11013–11022.
    [25] Gao YP, Lin KY, Yan JK, et al. AsyFOD: An asymmetric adaptation paradigm for few-shot domain adaptive object detection. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 3261–3271.
    [26] Ganin Y, Lempitsky V. Unsupervised domain adaptation by backpropagation. Proceedings of the 32nd International Conference on Machine Learning. Lille: PMLR, 2015. 1180–1189.
    [27] Yu CH, Wang JD, Chen YQ, et al. Transfer learning with dynamic adversarial adaptation network. Proceedings of the 2019 IEEE International Conference on Data Mining. Beijing: IEEE, 2019. 778–786.
    [28] Zhou Q, Zhou WA, Wang SR, et al. Duplex adversarial networks for multiple-source domain adaptation. Knowledge-based Systems, 2021, 211: 106569.
    [29] Sun T, Lu C, Ling HB. Domain adaptation with adversarial training on penultimate activations. Proceedings of the 37th AAAI Conference on Artificial Intelligence. Washington: AAAI, 2023. 9935–9943.
    [30] Bennin KE, Keung J, Phannachitta P, et al. MAHAKIL: Diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction. IEEE Transactions on Software Engineering, 2018, 44(6): 534–550.
    [31] Feng S, Keung J, Yu X, et al. COSTE: Complexity-based oversampling technique to alleviate the class imbalance problem in software defect prediction. Information and Software Technology, 2021, 129: 106432.
    [32] Limsettho N, Bennin KE, Keung JW, et al. Cross project defect prediction using class distribution estimation and oversampling. Information and Software Technology, 2018, 100: 87–102.
    [33] Japkowicz N, Stephen S. The class imbalance problem: A systematic study. Intelligent Data Analysis, 2002, 6(5): 429–449.
    [34] Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 1958, 65(6): 386–408.
    [35] Wang KF, Gou C, Duan YJ, et al. Generative adversarial networks: Introduction and outlook. IEEE/CAA Journal of Automatica Sinica, 2017, 4(4): 588–598.
    [36] Huang JY, Smola AJ, Gretton A, et al. Correcting sample selection bias by unlabeled data. Proceedings of the 19th International Conference on Neural Information Processing Systems. Vancouver: MIT Press, 2006. 601–608.
    [37] Jureczko M, Madeyski L. Towards identifying software project clusters with regard to defect prediction. Proceedings of the 6th International Conference on Predictive Models in Software Engineering. Timişoara: ACM, 2010. 9.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

高芹芹,凌松松,于婕,于旭.基于动态分布对齐和伪标签学习的跨项目缺陷预测.计算机系统应用,2024,33(8):40-50

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:January 14,2024
  • Revised:February 07,2024
  • Online: July 03,2024
Article QR Code
You are the first1025910Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063