Advances in Imperfect Supervision: From Multiple Unlabeled Sets to Weakly-Annotated Graphs
Access status:
Open Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Wu, YuhaoAbstract
Supervised machine learning has been a major driver of progress in artificial intelligence, powering applications across domains such as healthcare, robotics, and related fields. The methods from supervised learning typically rely on large datasets with accurate labels. However, ...
See moreSupervised machine learning has been a major driver of progress in artificial intelligence, powering applications across domains such as healthcare, robotics, and related fields. The methods from supervised learning typically rely on large datasets with accurate labels. However, in real-world settings, such perfectly labeled data is often unrealistic due to imperfections in data collection, including limited availability, missing values, and annotation errors. These challenges have led to the development of reliable and robust approaches that can effectively handle imperfect supervision, which commonly arises in three forms: inexact, incomplete, and inaccurate supervision. This thesis investigates advanced topics spanning these three core forms of imperfect supervision. For inexact supervision, we introduce a novel problem setting for binary classification using multiple unlabeled datasets, which relies on minimal and easily obtainable supervision signals. In the context of incomplete supervision, we focus on graph-based positive-unlabeled (PU) learning and reveal how the structural characteristics of graphs can violate key assumptions of conventional PU approaches. Under inaccurate supervision, we tackle the problem of label noise in graph data by proposing a topological sample selection approach that leverages graph structure to identify clean and informative nodes more effectively. Together, this thesis advances the understanding and capability of machine learning under imperfect supervision, particularly in structurally complex environments such as graphs. By systematically addressing challenges across inexact, incomplete, and inaccurate supervision, the proposed methodologies bridge theoretical principles with practical implementation and pave the way for more robust, adaptable, and trustworthy learning systems even when training data is coarse, partial, or noisy.
See less
See moreSupervised machine learning has been a major driver of progress in artificial intelligence, powering applications across domains such as healthcare, robotics, and related fields. The methods from supervised learning typically rely on large datasets with accurate labels. However, in real-world settings, such perfectly labeled data is often unrealistic due to imperfections in data collection, including limited availability, missing values, and annotation errors. These challenges have led to the development of reliable and robust approaches that can effectively handle imperfect supervision, which commonly arises in three forms: inexact, incomplete, and inaccurate supervision. This thesis investigates advanced topics spanning these three core forms of imperfect supervision. For inexact supervision, we introduce a novel problem setting for binary classification using multiple unlabeled datasets, which relies on minimal and easily obtainable supervision signals. In the context of incomplete supervision, we focus on graph-based positive-unlabeled (PU) learning and reveal how the structural characteristics of graphs can violate key assumptions of conventional PU approaches. Under inaccurate supervision, we tackle the problem of label noise in graph data by proposing a topological sample selection approach that leverages graph structure to identify clean and informative nodes more effectively. Together, this thesis advances the understanding and capability of machine learning under imperfect supervision, particularly in structurally complex environments such as graphs. By systematically addressing challenges across inexact, incomplete, and inaccurate supervision, the proposed methodologies bridge theoretical principles with practical implementation and pave the way for more robust, adaptable, and trustworthy learning systems even when training data is coarse, partial, or noisy.
See less
Date
2025Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Computer ScienceAwarding institution
The University of SydneyShare