基于熵正则化近端策略优化的联邦客户端选择

doi:10.15888/j.cnki.csa.010083

AIPUB归智期刊联盟

微信公众号

网站二维码

首页 > 过刊浏览>2026年第35卷第2期 >141-153. DOI:10.15888/j.cnki.csa.010083

PDF HTML阅读 XML下载导出引用引用提醒

基于熵正则化近端策略优化的联邦客户端选择
DOI:
                        10.15888/j.cnki.csa.010083
                    
CSTR:
                        32024.14.csa.010083
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金面上项目(62271264); 浙江省“尖兵领雁+X”重大科技计划 (2025C02033)

Entropy Regularization Proximal Policy Optimization for Federated Client Selection

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

近年来, 联邦学习(federated learning, FL)作为一种分布式机器学习范式, 因其能够在保护数据隐私的同时实现模型训练, 已在智能医疗、金融服务、物联网以及车联网等领域得到广泛应用. 在车联网(IoV)环境中, 由于节点高度动态和车辆资源的异构性, 并非所有客户端都适合参与联邦训练, 因此高效且鲁棒的客户端选择策略对于模型性能与系统效率至关重要. 然而, 传统FL方法大多依赖静态或启发式的客户端选择机制, 难以适应IoV场景中频繁变化的环境状态与客户端特性. 为此, 本文提出一种基于熵正则化近端策略优化(entropy regularization proximal policy optimization, ERPPO)的动态客户端选择方法, 并结合置信度加权聚合策略. 该方法通过在近端策略优化(proximal policy optimization, PPO)目标函数中引入策略熵正则项, 增强客户端选择策略的探索性, 以避免陷入局部最优. 同时, 置信度聚合机制基于客户端模型更新方差自适应调整聚合权重, 提升全局模型的收敛稳定性与鲁棒性. 实验结果表明, 所提方法在保障模型精度的前提下, 有效降低了通信开销, 并在动态环境下展现出优于传统方法的综合性能.

Abstract:

In recent years, federated learning (FL) has emerged as a distributed machine learning paradigm that enables model training while preserving data privacy. It has been widely applied in domains such as smart healthcare, financial services, the Internet of Things (IoT), and the Internet of Vehicles (IoV). However, due to the highly dynamic nature of IoV environments and the heterogeneous computing resources among vehicles, not all clients are suitable for participation in federated training. Therefore, designing an efficient and robust client selection strategy is critical for ensuring model performance and system efficiency. Traditional FL methods often rely on static or heuristic client selection mechanisms, which fail to adapt to the frequently changing states and characteristics of clients in IoV scenarios. To address this issue, this study proposes a dynamic client selection approach based on entropy regularization proximal policy optimization (ERPPO), integrated with a confidence-weighted aggregation mechanism. By incorporating a policy entropy regularization term into the PPO objective function, the proposed method enhances the exploration capability of the client selection policy, thus mitigating the risk of local optima. Furthermore, the confidence-based aggregation strategy adaptively adjusts the aggregation weights based on the variance of local model updates, which enhances the convergence stability and robustness of the global model. Experimental results demonstrate that the proposed ERPPO framework not only reduces communication overhead but also achieves superior overall performance in dynamic environments while maintaining high model precision.

参考文献

相似文献

引证文献

引用本文

陈雨彤,金子龙.基于熵正则化近端策略优化的联邦客户端选择.计算机系统应用,2026,35(2):141-153

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2025-08-05
最后修改日期:2025-09-16
录用日期:
在线发布日期: 2025-12-26
出版日期:

微信公众号

网站二维码

引用本文

分享

相关视频

文章指标

历史

文章二维码