基于TD3算法的自动协商策略

doi:10.15888/j.cnki.csa.008973

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月1日 5:37 星期二

首页 > 过刊浏览>2023年第32卷第3期 >15-24. DOI:10.15888/j.cnki.csa.008973

PDF HTML阅读 XML下载导出引用引用提醒

基于TD3算法的自动协商策略
DOI:
                        10.15888/j.cnki.csa.008973
                    
CSTR:
                        
                    
作者:
                        陈佐明陈佐明
华南师范大学 计算机学院, 广州 510631
在期刊界中查找
在百度中查找
在本站中查找
詹捷宇詹捷宇
华南师范大学 计算机学院, 广州 510631
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金青年基金(62006085)

Automated Negotiation Strategy Based on TD3 Algorithm

Author:

CHEN Zuo-Ming
CHEN Zuo-Ming
School of Computer Science, South China Normal University, Guangzhou 510631, China
在期刊界中查找
在百度中查找
在本站中查找
ZHAN Jie-Yu
ZHAN Jie-Yu
School of Computer Science, South China Normal University, Guangzhou 510631, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

协商是人们就某些议题进行交流寻求一致协议的过程. 而自动协商旨在通过协商智能体的使用降低协商成本、提高协商效率并且优化协商结果. 近年来深度强化学习技术开始被运用于自动协商领域并取得了良好的效果, 然而依然存在智能体训练时间较长、特定协商领域依赖、协商信息利用不充分等问题. 为此, 本文提出了一种基于TD3深度强化学习算法的协商策略, 通过预训练降低训练过程的探索成本, 通过优化状态和动作定义提高协商策略的鲁棒性从而适应不同的协商场景, 通过多头语义神经网络和对手偏好预测模块充分利用协商的交互信息. 实验结果表明, 该策略在不同协商环境下都可以很好地完成协商任务.

关键词:自动协商;协商策略;深度强化学习;TD3算法;偏好预测

Abstract:

Negotiation refers to the process in which people communicate with each other on certain topics to reach an agreement. Automated negotiation aims to reduce negotiation costs, improve negotiation efficiency, and optimize negotiation results by using negotiating agents. In recent years, deep reinforcement learning techniques have been applied to the field of automated negotiation with good results. However, there are still problems such as the long training time of agents, dependence on specific negotiation domains, and insufficient utilization of negotiation information. Therefore, this study proposes a negotiation strategy based on the TD3 deep reinforcement learning algorithm, which reduces the exploration cost of the training process through pre-training and improves the robustness of the negotiation strategy by optimizing the state and action definitions, so as to adapt to different negotiation scenarios. In addition, it makes full use of the interaction information of the negotiation by multi-head semantic neural network and opponent preference prediction module. The experimental results show that the strategy can perform the negotiation task well in different negotiation environments.

Key words:automated negotiation;negotiation strategy;deep reinforcement learning;TD3 algorithm;preference prediction

引用本文

陈佐明,詹捷宇.基于TD3算法的自动协商策略.计算机系统应用,2023,32(3):15-24

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-07-27
最后修改日期:2022-08-26
录用日期:
在线发布日期: 2022-11-18
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码