Time-Efficient Evaluation and Enhancement of Adversarial Robustness in Deep Neural Networks

Lin, Runqi

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Lin, Runqi
dc.date.accessioned	2026-01-29T10:26:13Z
dc.date.available	2026-01-29T10:26:13Z
dc.date.issued	2025	en
dc.identifier.uri	https://hdl.handle.net/2123/34789
dc.description.abstract	With deep neural networks (DNNs) increasingly embedded in modern society, ensuring their safety has become critical and urgent. Substantial efforts have focused on the red–blue adversarial framework, where the red team identifies vulnerabilities in DNNs, and the blue team mitigates them. However, existing approaches from both teams remain computationally intensive, constraining their applicability to large models. To this end, this thesis endeavours to develop time-efficient methods for evaluating and enhancing adversarial robustness in DNNs. In red team, we examine the limited transferability of jailbreaking attacks, which undermines efficient multi-model vulnerability assessment via single-shot attack generation. For textual attacks, we show that jailbreaking prompts often overfit the model's sampling processes, and we counter this by dispersing the model’s perception to mitigate distributional dependency and enhance transferability. For visual attacks, we reveal that jailbreaking images tend to rely on model-specific features, confining them to sharpness regions and reducing transferability, whereas our method eliminates non-generalizable reliance to achieve flattened feasible regions. For blue team, we study the phenomenon of catastrophic overfitting (CO), which hinders efficient single-step adversarial training. First, we identify abnormal adversarial examples that conflict with the optimisation objective and are closely tied to CO, and we show that suppressing their generation can prevent CO. Second, we uncover that CO stems from the formation of pseudo-robust shortcuts that bypass genuine robustness learning, and we introduce adaptive weight perturbations to disrupt these shortcuts. Third, while natural overfitting, robust overfitting, and CO are typically regarded as distinct phenomena, we discover a shared behaviour whereby DNNs over-memorise specific training patterns that impair generalisation, and we propose a general framework to mitigate this tendency.	en
dc.language.iso	en	en
dc.subject	Adversarial Robustness	en
dc.subject	Jailbreaking Attack	en
dc.subject	Adversarial Training	en
dc.subject	Black-Box Attack	en
dc.subject	Catastrophic Overfitting	en
dc.subject	Red–Blue Teaming	en
dc.title	Time-Efficient Evaluation and Enhancement of Adversarial Robustness in Deep Neural Networks	en
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en
usyd.faculty	SeS faculties schools::Faculty of Engineering	en
usyd.degree	Doctor of Philosophy Ph.D.	en
usyd.awardinginst	The University of Sydney	en
usyd.advisor	Liu, Tongliang
usyd.include.pub	No	en