Data-Efficient and Generalizable Machine Learning in Complex Environments

Xia, Xiaobo

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Xia, Xiaobo
dc.date.accessioned	2024-08-13T04:33:53Z
dc.date.available	2024-08-13T04:33:53Z
dc.date.issued	2024	en_AU
dc.identifier.uri	https://hdl.handle.net/2123/32939
dc.description.abstract	In an age marked by an unprecedented influx of data across diverse domains, the quest for effective machine learning (ML) solutions has increased significantly. However, data imperfections in complex environments present formidable obstacles, encompassing defective, redundant, and scarce data. Specifically, defective data, characterized by annotation errors and incompleteness, obstruct the learning process, particularly in critical domains such as healthcare and finance. Redundant data overwhelm relevant insights, demanding efficient filtering techniques for optimal ML performance. Besides, scarce data that are prevalent in domains with limited examples, necessitate robust ML models capable of generalizing effectively. Addressing these challenges is pivotal for unlocking the full potential of ML technologies. This thesis offers innovative solutions across three key areas: learning with defective data, redundant data, and scarce data. Particularly, for defective data, it explores learning with mislabelled and incomplete data, which proposes novel methods for handling each scenario. In the realm of redundant data, the thesis introduces a moderate coreset selection technique to enhance ML efficiency across diverse practical tasks, and a refined coreset selection strategy to reduce the size of the constructed coreset while maintaining satisfactory model performance. Additionally, it addresses the challenge of scarce data by proposing advanced strategies for kernel mean estimation and augmenting datasets by marginalized corruption distributions to improve sample efficiency and model generalization. This thesis provides comprehensive insights and solutions for learning with imperfect data. By addressing these obstacles, it promotes the development of data-efficient and generalizable ML, and lays the groundwork for transformative breakthroughs in fields such as healthcare, finance, and climate science, propelling innovation and progress fuelled by the power of ML.	en_AU
dc.language.iso	en	en_AU
dc.title	Data-Efficient and Generalizable Machine Learning in Complex Environments	en_AU
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en_AU
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en_AU
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Computer Science	en_AU
usyd.degree	Doctor of Philosophy Ph.D.	en_AU
usyd.awardinginst	The University of Sydney	en_AU
usyd.advisor	Liu, Tongliang