Robust and Efficient Training of Deep Neural Networks via Principled Stopping Strategies
Access status:
Open Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Yuan, SuqinAbstract
In many modern regimes, deep neural networks can overfit the training data while still generalizing well, which has enabled large-scale training and motivated neural scaling law. Nevertheless, benign overfitting is not universal. In practical scenarios, such as learning with noisy ...
See moreIn many modern regimes, deep neural networks can overfit the training data while still generalizing well, which has enabled large-scale training and motivated neural scaling law. Nevertheless, benign overfitting is not universal. In practical scenarios, such as learning with noisy labels or tight computational budgets, when to stop training (and, more generally, what to stop training on) remains a consequential and under-explored question. Stopping too late can amplify memorization of spurious patterns and waste computation, while stopping too early can prevent the model from acquiring useful features. This thesis moves toward principled stopping strategies by grounding stopping rules in training dynamics rather than relying on clean validation sets or hand-tuned schedules. We first study learning dynamics through the lens of memorization and forgetting. By tracking prediction trajectories over epochs, we identify a stage transition in which networks begin to substantially fit spurious or mislabeled patterns, accompanied by a distinctive change in aggregate forgetting behavior. Leveraging this transition, we propose validation-free criteria that select a reliable stopping point directly from training-time signals, requiring neither additional data nor expensive preprocessing. Beyond epoch-level decisions, we explore instance-level stopping for efficient and robust optimization. We adopt a perspective that estimates whether an example has been sufficiently learned and adaptively reduces its participation in later training, thereby reallocating computation toward still-unmastered instances. Finally, we show that stopping principles also inform robust learning under imperfect supervision. We investigate a criterion based on the evolution of label consistency to assess whether the model has effectively learned an instance, and use it to identify high-confidence clean samples in the presence of label noise.
See less
See moreIn many modern regimes, deep neural networks can overfit the training data while still generalizing well, which has enabled large-scale training and motivated neural scaling law. Nevertheless, benign overfitting is not universal. In practical scenarios, such as learning with noisy labels or tight computational budgets, when to stop training (and, more generally, what to stop training on) remains a consequential and under-explored question. Stopping too late can amplify memorization of spurious patterns and waste computation, while stopping too early can prevent the model from acquiring useful features. This thesis moves toward principled stopping strategies by grounding stopping rules in training dynamics rather than relying on clean validation sets or hand-tuned schedules. We first study learning dynamics through the lens of memorization and forgetting. By tracking prediction trajectories over epochs, we identify a stage transition in which networks begin to substantially fit spurious or mislabeled patterns, accompanied by a distinctive change in aggregate forgetting behavior. Leveraging this transition, we propose validation-free criteria that select a reliable stopping point directly from training-time signals, requiring neither additional data nor expensive preprocessing. Beyond epoch-level decisions, we explore instance-level stopping for efficient and robust optimization. We adopt a perspective that estimates whether an example has been sufficiently learned and adaptively reduces its participation in later training, thereby reallocating computation toward still-unmastered instances. Finally, we show that stopping principles also inform robust learning under imperfect supervision. We investigate a criterion based on the evolution of label consistency to assess whether the model has effectively learned an instance, and use it to identify high-confidence clean samples in the presence of label noise.
See less
Date
2026Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Computer ScienceAwarding institution
The University of SydneyShare