基于动态分布对齐和伪标签学习的跨项目缺陷预测

doi:10.15888/j.cnki.csa.009558

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月6日 5:33 星期日

首页 > 过刊浏览>2024年第33卷第8期 >40-50. DOI:10.15888/j.cnki.csa.009558

PDF HTML阅读 XML下载导出引用引用提醒

基于动态分布对齐和伪标签学习的跨项目缺陷预测
DOI:
                        10.15888/j.cnki.csa.009558
                    
CSTR:
                        32024.14.csa.009558
                    
作者:
                        高芹芹高芹芹
青岛科技大学 数据科学学院, 青岛 266061
在期刊界中查找
在百度中查找
在本站中查找
凌松松凌松松
青岛科技大学 数据科学学院, 青岛 266061
在期刊界中查找
在百度中查找
在本站中查找
于婕于婕
青岛科技大学 数据科学学院, 青岛 266061
在期刊界中查找
在百度中查找
在本站中查找
于旭于旭
青岛科技大学 数据科学学院, 青岛 266061
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(62172249); 中央高校基本科研业务费专项资金(93K172022K01)

Cross-project Defect Prediction Based on Dynamic Distribution Alignment and Pseudo-label Learning

Author:

GAO Qin-Qin
GAO Qin-Qin
School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, China
在期刊界中查找
在百度中查找
在本站中查找
LING Song-Song
LING Song-Song
School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, China
在期刊界中查找
在百度中查找
在本站中查找
YU Jie
YU Jie
School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, China
在期刊界中查找
在百度中查找
在本站中查找
YU Xu
YU Xu
School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

跨项目缺陷预测(cross-project defect prediction, CPDP)已经成为软件工程和数据挖掘领域的一个重要研究方向, 利用其他数据丰富项目的缺陷代码来建立预测模型, 解决了模型构建过程中的数据不足问题. 然而源项目和目标项目的代码文件之间存在的分布差异, 导致跨项目预测效果不佳. 大多数研究采用域适应方法来解决这一问题, 但是现有的方法一方面只考虑了条件分布或边缘分布对缺陷预测的影响, 忽视了其动态性; 另一方面没有选择合适的伪标签. 基于上述两个方面, 本文提出了一种基于动态分布对齐和伪标签学习的跨项目缺陷预测方法(DPLD). 具体来说, 我们通过对抗域适应方法分别在域对齐和类别对齐模块中减小项目间的边缘分布差异和条件分布差异, 并借助动态分布因子动态、定量地描述了两种分布的相对重要性. 此外, 本文也提出了一种伪标签学习方法, 通过数据间的几何相似性来增强伪标签作为真实标签的准确性. 本文在PROMISE数据集上进行了实验, F-measure和AUC的值分别提升了22.98%、15.21%, 表明了本文方法在减小项目间分布差异、提升跨项目缺陷预测性能上的有效性.

关键词:领域自适应;跨项目缺陷预测;条件分布;边缘分布;伪标签学习

Abstract:

Cross-project defect prediction (CPDP) has emerged as a crucial research area in software engineering and data mining. Using defective code from other data-rich projects to build prediction models solves the problem of insufficient data during model construction. However, the distribution difference between the code files of source and target projects results in poor cross-project prediction. Most studies adopt the domain adaptation methods to solve this problem, but the existing methods only focus on the influence of conditional or marginal distribution on domain adaptation, ignoring its dynamics. On the other hand, they fail to choose appropriate pseudo-labels. Based on the above two aspects, this study proposes a cross-project defect prediction method based on dynamic distribution alignment and pseudo-label learning (DPLD). Specifically, the proposed method reduces the marginal and conditional distribution differences between projects in the domain alignment and category alignment modules, respectively, by means of the adversarial domain adaptation method. Additionally, it dynamically and quantitatively characterizes the relative importance of the two distributions using dynamic distribution factors. Furthermore, this study proposes a pseudo-label learning method to enhance the accuracy of pseudo-labels as real labels through the geometric similarity between data. Experiments conducted on the PROMISE dataset show that DPLD achieves average improvements of 22.98% and 15.21% in terms of F-measure and AUC, respectively. These results demonstrate the effectiveness of the DPLD method in reducing distribution differences between projects and improving the performance of cross-project defect prediction.

Key words:domain adaption;cross-project defect prediction;conditional distribution;marginal distribution;pseudo-label learning

引用本文

高芹芹,凌松松,于婕,于旭.基于动态分布对齐和伪标签学习的跨项目缺陷预测.计算机系统应用,2024,33(8):40-50

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-01-14
最后修改日期:2024-02-07
录用日期:
在线发布日期: 2024-07-03
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码