Exploring Robust Overfitting in Adversarial Training: The Formation, Progression, and Mechanism
Access status:
Open Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Yu, ChaojianAbstract
Deep neural networks (DNNs) have achieved remarkable success across various fields but remain highly vulnerable to adversarial attacks, prompting the development of numerous defense techniques. Among them, adversarial training (AT) is recognized as one of the most effective empirical ...
See moreDeep neural networks (DNNs) have achieved remarkable success across various fields but remain highly vulnerable to adversarial attacks, prompting the development of numerous defense techniques. Among them, adversarial training (AT) is recognized as one of the most effective empirical defenses. However, AT suffers from robust overfitting (RO), a dominant phenomenon whose explanation remains unclear. To fulfill this knowledge gap, this thesis investigates RO in AT, focusing on its formation, progression, and underlying mechanisms. First, we analyze the role of weight perturbation in RO, finding that it compensates for input attacks and that the insufficient attack of the adversary facilitates the formation of robust overfitting. To address this, we propose a robust weight perturbation method that mitigates RO and enhances adversarial robustness. Second, we explore the progression of RO by examining small-loss and large-loss adversarial data through data ablation adversarial training. We identify that some small-loss adversarial data cause RO and propose the minimum loss constrained adversarial training (MLCAT) prototype. Extensive experiments demonstrate that MLCAT effectively mitigates RO and improves robustness. Third, we explore the underlying mechanisms of RO. We identify that the inducing factor of RO stems from natural data through factor ablation adversarial training. Based on these findings, we provide a holistic understanding of RO from the feature generalization perspective and propose two representative methods to examine this perspective. Lastly, we introduce a strength-adaptive adversarial training approach, which generates adversarial data based on specified loss constraints rather than fixed perturbation budgets. This method prevents RO while maintaining stable robustness disparity between natural accuracy and adversarial robustness.
See less
See moreDeep neural networks (DNNs) have achieved remarkable success across various fields but remain highly vulnerable to adversarial attacks, prompting the development of numerous defense techniques. Among them, adversarial training (AT) is recognized as one of the most effective empirical defenses. However, AT suffers from robust overfitting (RO), a dominant phenomenon whose explanation remains unclear. To fulfill this knowledge gap, this thesis investigates RO in AT, focusing on its formation, progression, and underlying mechanisms. First, we analyze the role of weight perturbation in RO, finding that it compensates for input attacks and that the insufficient attack of the adversary facilitates the formation of robust overfitting. To address this, we propose a robust weight perturbation method that mitigates RO and enhances adversarial robustness. Second, we explore the progression of RO by examining small-loss and large-loss adversarial data through data ablation adversarial training. We identify that some small-loss adversarial data cause RO and propose the minimum loss constrained adversarial training (MLCAT) prototype. Extensive experiments demonstrate that MLCAT effectively mitigates RO and improves robustness. Third, we explore the underlying mechanisms of RO. We identify that the inducing factor of RO stems from natural data through factor ablation adversarial training. Based on these findings, we provide a holistic understanding of RO from the feature generalization perspective and propose two representative methods to examine this perspective. Lastly, we introduce a strength-adaptive adversarial training approach, which generates adversarial data based on specified loss constraints rather than fixed perturbation budgets. This method prevents RO while maintaining stable robustness disparity between natural accuracy and adversarial robustness.
See less
Date
2024Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Computer ScienceAwarding institution
The University of SydneyShare