反事实增强的对抗学习序列推荐

doi:10.15888/j.cnki.csa.009470

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月14日 13:23 星期一

首页 > 过刊浏览>2024年第33卷第4期 >235-245. DOI:10.15888/j.cnki.csa.009470

PDF HTML阅读 XML下载导出引用引用提醒

反事实增强的对抗学习序列推荐
DOI:
                        10.15888/j.cnki.csa.009470
                    
CSTR:
                        32024.14.csa.009470
                    
作者:
                        刘珈麟刘珈麟
中国科学院 计算机网络信息中心, 北京 100083;中国科学院大学, 北京 100049
在期刊界中查找
在百度中查找
在本站中查找
贺泽宇贺泽宇
北京信息科技大学 计算机学院, 北京 100101
在期刊界中查找
在百度中查找
在本站中查找
李俊李俊
中国科学院 计算机网络信息中心, 北京 100083
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(61672490, 61602436); 中国科学院对外合作重点项目(241711KYSB20180002); 国家重大研发计划子课题(2022YFC3320900)

Counterfactual Enhanced Adversarial Learning for Sequential Recommendation

Author:

LIU Jia-Lin
LIU Jia-Lin
Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China;University of Chinese Academy of Sciences, Beijing 100049, China
在期刊界中查找
在百度中查找
在本站中查找
HE Ze-Yu
HE Ze-Yu
Computer School, Beijing Information Science and Technology University, Beijing 100101, China
在期刊界中查找
在百度中查找
在本站中查找
LI Jun
LI Jun
Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

最近, 强化学习技术在序列推荐系统取得成功, 它能从用户长期反馈信号中学习有效的推荐策略. 然而, 模型的激励函数设计面临区分度过低的难题. 这限制了模型学习不同用户反馈信号间的价值差异的能力, 并导致推荐策略总是次优的. 现有工作主要通过调节衰减因子来保证激励函数区分度, 但它依赖专家先验知识缺乏理论基础. 为了更合理地设计激励函数和提高其区分度, 本文依据因果论来分析推荐系统, 并提出一种基于反事实区分度增强的序列推荐算法CAL4Rec. 首先, 所提出方法用结构因果图描述序列推荐过程, 并创造性地用因果图定义了因果可鉴别的价值激励区分度. 其次, 该方法用反事实生成对抗的自监督学习过程优化推荐策略网络, 以学习用户的真实倾向. 在一系列序列推荐基准数据集上, 对CAL4Rec开展了广泛对比和消融实验, 实验结果表明CAL4Rec的提升对多种网络实现结构有效(平均2.34%).

关键词:反事实推理;生成对抗学习;结构因果模型;序列推荐

Abstract:

Recently, reinforcement learning techniques have achieved success in sequence recommendation systems, as they can learn effective recommendation strategies from long-term user feedback signals. However, the design of the model’s reward function faces the challenge of low discriminability. This limits the model’s ability to learn the value differences between different user feedback signals, leading to suboptimal recommendation strategies. Existing studies mainly ensure discriminability of the reward function by adjusting decay factors, but this relies on expert prior knowledge and lacks a theoretical foundation. In order to more reasonably design the reward function and enhance its discriminability, this study analyzes the recommendation system based on counterfactual reasoning and proposes a sequence recommendation algorithm CAL4Rec based on counterfactual discriminability enhancement. Firstly, the proposed method uses structural causal graphs to describe the sequence recommendation process and creatively defines causally identifiable value reward discriminability using causal graphs. Secondly, this method uses a counterfactual generative adversarial self-supervised learning process to optimize the recommendation strategy network and learn the user’s true preferences. Extensive comparative and ablation experiments were conducted on a series of sequence recommendation benchmark datasets for CAL4Rec, and the experimental results show that CAL4Rec’s improvement is effective for various network implementation structures (average 2.34%).

Key words:counterfactual reasoning;generative adversarial learning;structural causal model;sequential recommendation

引用本文

刘珈麟,贺泽宇,李俊.反事实增强的对抗学习序列推荐.计算机系统应用,2024,33(4):235-245

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2023-10-27
最后修改日期:2023-11-27
录用日期:
在线发布日期: 2024-03-07
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码