语义扩展连续查询的重复错误报告预测
作者:
基金项目:

国家自然科学基金(62077003,61872026)


Prediction of Duplicate Bug Reports Based on Semantically Extended Continuous Queries
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [16]
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    随着软件项目规模的增大与复杂性的增加, 测试过程产生了大量的错误报告, 其中重复的错误报告广泛存在. 重复错误报告的存在, 降低了开发人员修复错误的效率. 重复错误报告预测可有效地避免重复错误报告的产生, 是近年来的热门研究方向之一, 但其效率及准确率有待提高. 为此, 提出一种基于语义扩展连续查询的重复错误报告预测方法, 通过构建基于主题模型的错误报告索引词库, 对查询词序列进行语义扩展, 采用基于连续查询的错误报告检索算法, 在缩小索引空间的同时, 提升了预测准确率与效率. 实验表明, 相较于传统重复错误报告预测方法, 该方法减小了50%以上的错误报告索引空间, 最高提升了33.6%的预测效果, 且缩短了41%–73%的检索时间.

    Abstract:

    With the increase in scale and complexity of software projects, a large number of bug reports are generated during the testing process, among which duplicate bug reports are widely present, reducing the efficiency of developers in fixing bugs. The prediction of duplicate bug report has become one of the popular research fields in recent years, and its efficiency and accuracy need to be improved. Therefore, this study puts forward a prediction method of duplicate bug reports based on semantic extension and continuous queries. Through the construction of a bug report index thesaurus based on the theme model, the semantic extension of query sequences is conducted. Then, the bug report retrieval algorithm based on the continuous query is adopted to narrow the index space and improve the prediction accuracy and efficiency. Experimental results show that compared with the traditional prediction method of duplicate bug reports, the proposed method reduces the index space of bug reports by more than 50%, improves the prediction effect by up to 33.6%, and shortens the retrieval time by 41%–73%.

    参考文献
    [1] Sabor KK, Hamou-Lhadj A, Larsson A. DURFEX: A feature extraction technique for efficient detection of duplicate bug reports. 2017 IEEE International Conference on Software Quality, Reliability and Security. Prague: IEEE, 2017. 240–250.
    [2] Rakha MS, Shang WY, Hassan AE. Studying the needed effort for identifying duplicate issues. Empirical Software Engineering, 2016, 21(5): 1960–1989. [doi: 10.1007/s10664-015-9404-6
    [3] Wang XY, Zhang L, Xie T, et al. An approach to detecting duplicate bug reports using natural language and execution information. Proceedings of the 30th International Conference on Software Engineering. New York: ACM, 2008. 461–470.
    [4] 范道远, 孙吉红, 王炜, 等. 融合文本与分类信息的重复缺陷报告检测方法. 计算机科学, 2019, 46(12): 192–200. [doi: 10.11896/jsjkx.181102232
    [5] Chaparro O, Florez JM, Marcus A. Using observed behavior to reformulate queries during text retrieval-based bug localization. 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). Shanghai: IEEE, 2017. 376–387.
    [6] Chaparro O, Florez JM, Singh U, et al. Reformulating queries for duplicate bug report detection. 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). Hangzhou: IEEE, 2019. 218–229.
    [7] Sun CN, Lo D, Wang XY. A discriminative model approach for accurate duplicate bug report retrieval. 2010 ACM/IEEE 32nd International Conference on Software Engineering. Cape Town: IEEE, 2010. 45–54.
    [8] Hindle A, Onuczko C. Preventing duplicate bug reports by continuously querying bug reports. Empirical Software Engineering, 2019, 24(2): 902–936. [doi: 10.1007/s10664-018-9643-4
    [9] Lukins SK, Kraft NA, Etzkorn LH. Source code retrieval for bug localization using latent dirichlet allocation. 2008 15th Working Conference on Reverse Engineering. Antwerp: IEEE, 2008. 155–164.
    [10] Alipour A, Hindle A, Stroulia E. A contextual approach towards more accurate duplicate bug report detection. 2013 10th Working Conference on Mining Software Repositories (MSR). San Francisco: IEEE, 2013. 184–192.
    [11] Youm KC, Ahn J, Lee E. Improved bug localization based on code change histories and bug reports. Information and Software Technology, 2017, 82: 177–192. [doi: 10.1016/j.infsof.2016.11.002
    [12] 肖晗, 毛雪松, 朱泽德. 基于HybridDL模型的文本相似度检测方法. 电子技术应用, 2020, 46(6): 28–31, 35
    [13] Tang ZQ, Zhang XA, Niu JM. LDA model and network embedding-based collaborative filtering recommendation. 2019 6th International Conference on Dependable Systems and Their Applications (DSA). Harbin: IEEE, 2019. 283–289.
    [14] 王世杰, 周丽华, 孔兵, 等. 基于LDA-DeepHawkes模型的信息级联预测. 计算机科学与探索, 2020, 14(3): 410–425. [doi: 10.3778/j.issn.1673-9418.1903065
    [15] 上官明霞, 朱珊珊, 陈晓亮, 等. 基于融合自然语言处理的语义分析方法研究. 计算机与网络, 2018, 44(20): 65–67. [doi: 10.3969/j.issn.1008-1739.2018.20.054
    [16] Soltani M, Hermans F, Bäck T. The significance of bug report elements. Empirical Software Engineering, 2020, 25(6): 5255–5294. [doi: 10.1007/s10664-020-09882-z
    相似文献
    引证文献
引用本文

张骞月,赵瑞莲,王微微.语义扩展连续查询的重复错误报告预测.计算机系统应用,2022,31(2):31-39

复制
分享
文章指标
  • 点击次数:813
  • 下载次数: 1247
  • HTML阅读次数: 1159
  • 引用次数: 0
历史
  • 收稿日期:2021-04-09
  • 最后修改日期:2021-05-11
  • 在线发布日期: 2022-01-28
文章二维码
您是第11204796位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号