基于熵正则化近端策略优化的联邦客户端选择
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金面上项目(62271264); 浙江省“尖兵领雁+X”重大科技计划 (2025C02033)


Entropy Regularization Proximal Policy Optimization for Federated Client Selection
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    近年来, 联邦学习(federated learning, FL)作为一种分布式机器学习范式, 因其能够在保护数据隐私的同时实现模型训练, 已在智能医疗、金融服务、物联网以及车联网等领域得到广泛应用. 在车联网(IoV)环境中, 由于节点高度动态和车辆资源的异构性, 并非所有客户端都适合参与联邦训练, 因此高效且鲁棒的客户端选择策略对于模型性能与系统效率至关重要. 然而, 传统FL方法大多依赖静态或启发式的客户端选择机制, 难以适应IoV场景中频繁变化的环境状态与客户端特性. 为此, 本文提出一种基于熵正则化近端策略优化(entropy regularization proximal policy optimization, ERPPO)的动态客户端选择方法, 并结合置信度加权聚合策略. 该方法通过在近端策略优化(proximal policy optimization, PPO)目标函数中引入策略熵正则项, 增强客户端选择策略的探索性, 以避免陷入局部最优. 同时, 置信度聚合机制基于客户端模型更新方差自适应调整聚合权重, 提升全局模型的收敛稳定性与鲁棒性. 实验结果表明, 所提方法在保障模型精度的前提下, 有效降低了通信开销, 并在动态环境下展现出优于传统方法的综合性能.

    Abstract:

    In recent years, federated learning (FL) has emerged as a distributed machine learning paradigm that enables model training while preserving data privacy. It has been widely applied in domains such as smart healthcare, financial services, the Internet of Things (IoT), and the Internet of Vehicles (IoV). However, due to the highly dynamic nature of IoV environments and the heterogeneous computing resources among vehicles, not all clients are suitable for participation in federated training. Therefore, designing an efficient and robust client selection strategy is critical for ensuring model performance and system efficiency. Traditional FL methods often rely on static or heuristic client selection mechanisms, which fail to adapt to the frequently changing states and characteristics of clients in IoV scenarios. To address this issue, this study proposes a dynamic client selection approach based on entropy regularization proximal policy optimization (ERPPO), integrated with a confidence-weighted aggregation mechanism. By incorporating a policy entropy regularization term into the PPO objective function, the proposed method enhances the exploration capability of the client selection policy, thus mitigating the risk of local optima. Furthermore, the confidence-based aggregation strategy adaptively adjusts the aggregation weights based on the variance of local model updates, which enhances the convergence stability and robustness of the global model. Experimental results demonstrate that the proposed ERPPO framework not only reduces communication overhead but also achieves superior overall performance in dynamic environments while maintaining high model precision.

    参考文献
    相似文献
    引证文献
引用本文

陈雨彤,金子龙.基于熵正则化近端策略优化的联邦客户端选择.计算机系统应用,2026,35(2):141-153

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-08-05
  • 最后修改日期:2025-09-16
  • 录用日期:
  • 在线发布日期: 2025-12-26
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62661041 传真: Email:csa@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号