Abstract:Selecting appropriate optimizers for a federated learning environment is an effective way to improve model performance, especially in situations where the data is highly heterogeneous. In this study, the FedAvg and FedALA algorithms are mainly investigated, and an improved version called pFedALA is proposed. PFedALA effectively reduces resource waste caused by synchronization demands by allowing clients to continue local training during waiting periods. Then, the roles of the optimizers in these three algorithms are analyzed in detail, and the performance of various optimizers such as stochastic gradient descent (SGD), Adam, averaged SGD (ASGD), and AdaGrad in handling non-independent and identically distributed (Non-IID) and imbalanced data is compared by testing them on the MNIST and CIFAR-10 datasets. Special attention is given to practical heterogeneity based on the Dirichlet distribution and extreme heterogeneity in terms of data setting. The experimental results suggest the following observations: 1) The pFedALA algorithm outperforms the FedALA algorithm, with an average test accuracy approximately 1% higher than that of FedALA; 2) Optimizers commonly used in traditional single-machine deep learning environments deliver significantly different performance in a federated learning environment. Compared with other mainstream optimizers, the SGD, ASGD, and AdaGrad optimizers appear to be more adaptable and robust in the federated learning environment.