Semi-Supervised Learning with Unlabeled data: from Centralized to Distributed Systems
Access status:
Open Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Yang, NanAbstract
The rapid increase in data generated by edge devices and IoT technologies demands efficient
management solutions, especially in terms of cost and infrastructure. Key challenges include
expensive data labeling, requiring significant human resources, and ensuring data privacy ...
See moreThe rapid increase in data generated by edge devices and IoT technologies demands efficient management solutions, especially in terms of cost and infrastructure. Key challenges include expensive data labeling, requiring significant human resources, and ensuring data privacy and security, with the risk of information leakage during transmission. The limited availability of labeled data versus the exponential growth of new data presents challenges for maintaining accuracy and efficiency in data-driven models. High costs of data annotation, especially involving subject-matter experts, limit model training effectiveness. Privacy concerns are heightened due to edge devices' interaction with sensitive user data, making the affordability of data labeling and data privacy protection at the edge crucial issues. To tackle these challenges, we proposed Data Augmentation Random Padding to increase effective data for model training in CNNs, enhancing image classification accuracy. Despite limitations, we developed Semi-Supervised Learning (SSL) method "AdaptMatch" to utilize a large amount of unlabeled data and a small number of labeled data in centralized learning, improving learning speed and reducing label bias. To address the inadequacies of centralized data learning, we introduced a decentralized SSL method, Federated Incremental Learning (FedIL), for learning on edge devices while protecting privacy. However, FedIL's lower training efficiency and struggles with data imbalance led to the development of Federated Masked Autoencoder (FedMAE), which is also a decentralized semi-supervised learning method based on self-supervised learning. FedMAE enables asynchronous training of large-scale unlabeled images in federated learning, outperforming existing methods in handling highly imbalanced data.
See less
See moreThe rapid increase in data generated by edge devices and IoT technologies demands efficient management solutions, especially in terms of cost and infrastructure. Key challenges include expensive data labeling, requiring significant human resources, and ensuring data privacy and security, with the risk of information leakage during transmission. The limited availability of labeled data versus the exponential growth of new data presents challenges for maintaining accuracy and efficiency in data-driven models. High costs of data annotation, especially involving subject-matter experts, limit model training effectiveness. Privacy concerns are heightened due to edge devices' interaction with sensitive user data, making the affordability of data labeling and data privacy protection at the edge crucial issues. To tackle these challenges, we proposed Data Augmentation Random Padding to increase effective data for model training in CNNs, enhancing image classification accuracy. Despite limitations, we developed Semi-Supervised Learning (SSL) method "AdaptMatch" to utilize a large amount of unlabeled data and a small number of labeled data in centralized learning, improving learning speed and reducing label bias. To address the inadequacies of centralized data learning, we introduced a decentralized SSL method, Federated Incremental Learning (FedIL), for learning on edge devices while protecting privacy. However, FedIL's lower training efficiency and struggles with data imbalance led to the development of Federated Masked Autoencoder (FedMAE), which is also a decentralized semi-supervised learning method based on self-supervised learning. FedMAE enables asynchronous training of large-scale unlabeled images in federated learning, outperforming existing methods in handling highly imbalanced data.
See less
Date
2024Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Electrical and Information EngineeringAwarding institution
The University of SydneyShare