###
计算机系统应用英文版:2024,33(4):235-245
本文二维码信息
码上扫一扫!
反事实增强的对抗学习序列推荐
(1.中国科学院 计算机网络信息中心, 北京 100083;2.中国科学院大学, 北京 100049;3.北京信息科技大学 计算机学院, 北京 100101)
Counterfactual Enhanced Adversarial Learning for Sequential Recommendation
(1.Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China;2.University of Chinese Academy of Sciences, Beijing 100049, China;3.Computer School, Beijing Information Science and Technology University, Beijing 100101, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 37次   下载 119
Received:October 27, 2023    Revised:November 27, 2023
中文摘要: 最近, 强化学习技术在序列推荐系统取得成功, 它能从用户长期反馈信号中学习有效的推荐策略. 然而, 模型的激励函数设计面临区分度过低的难题. 这限制了模型学习不同用户反馈信号间的价值差异的能力, 并导致推荐策略总是次优的. 现有工作主要通过调节衰减因子来保证激励函数区分度, 但它依赖专家先验知识缺乏理论基础. 为了更合理地设计激励函数和提高其区分度, 本文依据因果论来分析推荐系统, 并提出一种基于反事实区分度增强的序列推荐算法CAL4Rec. 首先, 所提出方法用结构因果图描述序列推荐过程, 并创造性地用因果图定义了因果可鉴别的价值激励区分度. 其次, 该方法用反事实生成对抗的自监督学习过程优化推荐策略网络, 以学习用户的真实倾向. 在一系列序列推荐基准数据集上, 对CAL4Rec开展了广泛对比和消融实验, 实验结果表明CAL4Rec的提升对多种网络实现结构有效(平均2.34%).
Abstract:Recently, reinforcement learning techniques have achieved success in sequence recommendation systems, as they can learn effective recommendation strategies from long-term user feedback signals. However, the design of the model’s reward function faces the challenge of low discriminability. This limits the model’s ability to learn the value differences between different user feedback signals, leading to suboptimal recommendation strategies. Existing studies mainly ensure discriminability of the reward function by adjusting decay factors, but this relies on expert prior knowledge and lacks a theoretical foundation. In order to more reasonably design the reward function and enhance its discriminability, this study analyzes the recommendation system based on counterfactual reasoning and proposes a sequence recommendation algorithm CAL4Rec based on counterfactual discriminability enhancement. Firstly, the proposed method uses structural causal graphs to describe the sequence recommendation process and creatively defines causally identifiable value reward discriminability using causal graphs. Secondly, this method uses a counterfactual generative adversarial self-supervised learning process to optimize the recommendation strategy network and learn the user’s true preferences. Extensive comparative and ablation experiments were conducted on a series of sequence recommendation benchmark datasets for CAL4Rec, and the experimental results show that CAL4Rec’s improvement is effective for various network implementation structures (average 2.34%).
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61672490, 61602436); 中国科学院对外合作重点项目(241711KYSB20180002); 国家重大研发计划子课题(2022YFC3320900)
引用文本:
刘珈麟,贺泽宇,李俊.反事实增强的对抗学习序列推荐.计算机系统应用,2024,33(4):235-245
LIU Jia-Lin,HE Ze-Yu,LI Jun.Counterfactual Enhanced Adversarial Learning for Sequential Recommendation.COMPUTER SYSTEMS APPLICATIONS,2024,33(4):235-245