Federated Learning with Momentum Acceleration in Multi-tier Networks
Access status:
USyd Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Yang, ZhengjieAbstract
Federated learning (FL) is a fast-developing technique that allows multiple workers to train a global model based on a distributed dataset. Conventional FL (FedAvg) employs the gradient descent algorithm, which may not be efficient enough. Momentum can improve the situation by ...
See moreFederated learning (FL) is a fast-developing technique that allows multiple workers to train a global model based on a distributed dataset. Conventional FL (FedAvg) employs the gradient descent algorithm, which may not be efficient enough. Momentum can improve the situation by adding an additional momentum step to accelerate the convergence. While existing momentum-based FL algorithms have demonstrated desirable performance, they still encounter challenges related to data heterogeneity, out-of-date momentum, infrequent utilization of momentum, and disagreement between worker and aggregator momenta. Additionally, the advantages of Nesterov Accelerated Gradient (NAG), a more advantageous form of momentum, have not been quantified in the context of FL. In this thesis, we investigate how to efficiently address the aforementioned issues. Firstly, we propose FedNAG, which focuses on NAG momentum acceleration on workers in each local iteration. FedNAG incorporates aggregation and redistribution of both worker models and momenta. Secondly, we introduce FastSlowMo, a novel algorithm that combines worker and aggregator momenta. By leveraging momentum acceleration on both workers and the aggregator, FastSlowMo improves the overall efficiency of the training process. Furthermore, considering the advantages of three-tier hierarchical architecture in reducing communication burdens within local networks, we propose HierMo, a momentum-based FL algorithm which accelerates three-tier FL systems and enhances their performance. Finally, we develop HierAdMo, which dynamically adjusts the momentum factor to mitigate the negative effects of disagreement between workers and edge nodes, ultimately improving the long-run performance. In summary, this thesis proposes four novel algorithms to address the aforementioned challenges. Theoretical analyses and experimental validations are conducted to support the effectiveness of these algorithms in real-world FL scenarios.
See less
See moreFederated learning (FL) is a fast-developing technique that allows multiple workers to train a global model based on a distributed dataset. Conventional FL (FedAvg) employs the gradient descent algorithm, which may not be efficient enough. Momentum can improve the situation by adding an additional momentum step to accelerate the convergence. While existing momentum-based FL algorithms have demonstrated desirable performance, they still encounter challenges related to data heterogeneity, out-of-date momentum, infrequent utilization of momentum, and disagreement between worker and aggregator momenta. Additionally, the advantages of Nesterov Accelerated Gradient (NAG), a more advantageous form of momentum, have not been quantified in the context of FL. In this thesis, we investigate how to efficiently address the aforementioned issues. Firstly, we propose FedNAG, which focuses on NAG momentum acceleration on workers in each local iteration. FedNAG incorporates aggregation and redistribution of both worker models and momenta. Secondly, we introduce FastSlowMo, a novel algorithm that combines worker and aggregator momenta. By leveraging momentum acceleration on both workers and the aggregator, FastSlowMo improves the overall efficiency of the training process. Furthermore, considering the advantages of three-tier hierarchical architecture in reducing communication burdens within local networks, we propose HierMo, a momentum-based FL algorithm which accelerates three-tier FL systems and enhances their performance. Finally, we develop HierAdMo, which dynamically adjusts the momentum factor to mitigate the negative effects of disagreement between workers and edge nodes, ultimately improving the long-run performance. In summary, this thesis proposes four novel algorithms to address the aforementioned challenges. Theoretical analyses and experimental validations are conducted to support the effectiveness of these algorithms in real-world FL scenarios.
See less
Date
2023Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Computer ScienceAwarding institution
The University of SydneyShare