基于相似性的CITCP强化学习奖励策略
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(61872026)


Similarity-based Reward Strategy of Reinforcement Learning in CITCP
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 增强出版
  • |
  • 文章评论
    摘要:

    在面向持续集成测试用例优先排序(continuous integration test case prioritization, CITCP)的强化学习方法中, 智能体通过对测试用例实施奖励从而调整测试用例优先排序策略以适应后续集成测试, 可以满足持续集成测试频繁迭代和快速反馈的需求. 智能体通常只奖励执行失效测试用例, 但实际工业程序持续集成测试具有集成高频繁但测试低失效的特点, 对CITCP的实际应用提出新的挑战. 测试低失效, 即稀少的执行失效测试用例数量, 会导致强化学习中奖励对象稀少, 引发强化学习的稀疏奖励问题. 本文研究一种强化学习奖励对象选择策略, 在奖励执行失效测试用例的基础上, 通过选择与执行失效测试用例相似的执行通过测试用例实施奖励, 从而增加奖励对象, 以解决奖励稀疏问题. 研究具体包括, 设计了一种测试用例历史执行信息序列和执行时间特征向量表示的相似性度量方法, 并基于相似性度量选择与执行失效测试用例集相似的执行通过测试用例集实施奖励. 在6个工业数据集上开展了实验研究, 结果表明基于相似性的奖励对象选择策略通过增加有效奖励对象解决了稀疏奖励问题, 并进一步提高了基于强化学习的持续集成测试用例优先排序质量.

    Abstract:

    In the reinforcement learning method for the continuous integration test case prioritization (CITCP), the agent rewards the test cases to realize the adjustment of test case prioritization strategy, and thus they can meet the needs of frequent iteration and rapid feedback in continuous integration testing. The agent usually only rewards the failure test cases. However, in the actual industrial processes, the continuous integration testing features high-frequency integration and low-failure-rate tests, which poses a new challenge to the actual application of CITCP. Low-failure-rate tests can be understood as a sparse number of failure test cases, which can lead to the sparsity of reward objects in reinforcement learning and bring about the sparse reward problem. In this study, a reward object selection strategy is proposed to solve the sparse reward problem. With the failure test cases rewarded, passing test cases similar to failure test cases are selected to be rewarded, and thus the number of reward objects increases. Specifically, the similarity measure method for test cases is designed with the feature vector representation of historical execution information sequences and duration time. Then, the passing test cases similar to the failure test cases are selected to be rewarded through the similarity measure. The experiments are conducted in six industrial data sets, and the results show that the similarity-based reward object selection strategy can effectively solve the sparse reward problem by increasing the reward objects and further improve the quality of reinforcement learning-based CITCP.

    参考文献
    相似文献
    引证文献
引用本文

杨羊,潘超月,曹天歌,李征.基于相似性的CITCP强化学习奖励策略.计算机系统应用,2022,31(2):325-334

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-04-10
  • 最后修改日期:2021-05-11
  • 录用日期:
  • 在线发布日期: 2022-01-28
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号