Show simple item record

FieldValueLanguage
dc.contributor.authorLin, Runqi
dc.date.accessioned2026-01-29T10:26:13Z
dc.date.available2026-01-29T10:26:13Z
dc.date.issued2025en
dc.identifier.urihttps://hdl.handle.net/2123/34789
dc.description.abstractWith deep neural networks (DNNs) increasingly embedded in modern society, ensuring their safety has become critical and urgent. Substantial efforts have focused on the red–blue adversarial framework, where the red team identifies vulnerabilities in DNNs, and the blue team mitigates them. However, existing approaches from both teams remain computationally intensive, constraining their applicability to large models. To this end, this thesis endeavours to develop time-efficient methods for evaluating and enhancing adversarial robustness in DNNs. In red team, we examine the limited transferability of jailbreaking attacks, which undermines efficient multi-model vulnerability assessment via single-shot attack generation. For textual attacks, we show that jailbreaking prompts often overfit the model's sampling processes, and we counter this by dispersing the model’s perception to mitigate distributional dependency and enhance transferability. For visual attacks, we reveal that jailbreaking images tend to rely on model-specific features, confining them to sharpness regions and reducing transferability, whereas our method eliminates non-generalizable reliance to achieve flattened feasible regions. For blue team, we study the phenomenon of catastrophic overfitting (CO), which hinders efficient single-step adversarial training. First, we identify abnormal adversarial examples that conflict with the optimisation objective and are closely tied to CO, and we show that suppressing their generation can prevent CO. Second, we uncover that CO stems from the formation of pseudo-robust shortcuts that bypass genuine robustness learning, and we introduce adaptive weight perturbations to disrupt these shortcuts. Third, while natural overfitting, robust overfitting, and CO are typically regarded as distinct phenomena, we discover a shared behaviour whereby DNNs over-memorise specific training patterns that impair generalisation, and we propose a general framework to mitigate this tendency.en
dc.language.isoenen
dc.subjectAdversarial Robustnessen
dc.subjectJailbreaking Attacken
dc.subjectAdversarial Trainingen
dc.subjectBlack-Box Attacken
dc.subjectCatastrophic Overfittingen
dc.subjectRed–Blue Teamingen
dc.titleTime-Efficient Evaluation and Enhancement of Adversarial Robustness in Deep Neural Networksen
dc.typeThesis
dc.type.thesisDoctor of Philosophyen
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en
usyd.facultySeS faculties schools::Faculty of Engineeringen
usyd.degreeDoctor of Philosophy Ph.D.en
usyd.awardinginstThe University of Sydneyen
usyd.advisorLiu, Tongliang
usyd.include.pubNoen


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.