Time-Efficient Evaluation and Enhancement of Adversarial Robustness in Deep Neural Networks
| Field | Value | Language |
| dc.contributor.author | Lin, Runqi | |
| dc.date.accessioned | 2026-01-29T10:26:13Z | |
| dc.date.available | 2026-01-29T10:26:13Z | |
| dc.date.issued | 2025 | en |
| dc.identifier.uri | https://hdl.handle.net/2123/34789 | |
| dc.description.abstract | With deep neural networks (DNNs) increasingly embedded in modern society, ensuring their safety has become critical and urgent. Substantial efforts have focused on the red–blue adversarial framework, where the red team identifies vulnerabilities in DNNs, and the blue team mitigates them. However, existing approaches from both teams remain computationally intensive, constraining their applicability to large models. To this end, this thesis endeavours to develop time-efficient methods for evaluating and enhancing adversarial robustness in DNNs. In red team, we examine the limited transferability of jailbreaking attacks, which undermines efficient multi-model vulnerability assessment via single-shot attack generation. For textual attacks, we show that jailbreaking prompts often overfit the model's sampling processes, and we counter this by dispersing the model’s perception to mitigate distributional dependency and enhance transferability. For visual attacks, we reveal that jailbreaking images tend to rely on model-specific features, confining them to sharpness regions and reducing transferability, whereas our method eliminates non-generalizable reliance to achieve flattened feasible regions. For blue team, we study the phenomenon of catastrophic overfitting (CO), which hinders efficient single-step adversarial training. First, we identify abnormal adversarial examples that conflict with the optimisation objective and are closely tied to CO, and we show that suppressing their generation can prevent CO. Second, we uncover that CO stems from the formation of pseudo-robust shortcuts that bypass genuine robustness learning, and we introduce adaptive weight perturbations to disrupt these shortcuts. Third, while natural overfitting, robust overfitting, and CO are typically regarded as distinct phenomena, we discover a shared behaviour whereby DNNs over-memorise specific training patterns that impair generalisation, and we propose a general framework to mitigate this tendency. | en |
| dc.language.iso | en | en |
| dc.subject | Adversarial Robustness | en |
| dc.subject | Jailbreaking Attack | en |
| dc.subject | Adversarial Training | en |
| dc.subject | Black-Box Attack | en |
| dc.subject | Catastrophic Overfitting | en |
| dc.subject | Red–Blue Teaming | en |
| dc.title | Time-Efficient Evaluation and Enhancement of Adversarial Robustness in Deep Neural Networks | en |
| dc.type | Thesis | |
| dc.type.thesis | Doctor of Philosophy | en |
| dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en |
| usyd.faculty | SeS faculties schools::Faculty of Engineering | en |
| usyd.degree | Doctor of Philosophy Ph.D. | en |
| usyd.awardinginst | The University of Sydney | en |
| usyd.advisor | Liu, Tongliang | |
| usyd.include.pub | No | en |
Associated file/s
Associated collections