Efficient Deep Neural Architecture Design and Training
Access status:
Open Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Huang, TaoAbstract
Deep learning has revolutionized numerous fields, but its success is often hindered by computational inefficiency, reliance on vast labeled datasets, and challenges in designing optimal architectures. This thesis addresses these issues through contributions in four key areas: ...
See moreDeep learning has revolutionized numerous fields, but its success is often hindered by computational inefficiency, reliance on vast labeled datasets, and challenges in designing optimal architectures. This thesis addresses these issues through contributions in four key areas: handcrafted efficient architecture design, automatic neural architecture evolution, effective knowledge distillation, and data-efficient training. First, we propose LightViT, a lightweight vision transformer, and LocalMamba, a visual state-space model, to advance handcrafted architecture design by balancing accuracy and efficiency. Second, we introduce GreedyNASv2, a method to optimize neural architecture search (NAS), and DyRep, a dynamic re-parameterization framework for evolving architectures during training. Third, our work on knowledge distillation includes DIST for improving logits-based distillation, MasKD for feature-level distillation via adaptive masks, and DiffKD, which unifies logit and feature distillation using diffusion models. Lastly, we tackle the challenge of data efficiency with ActGen, an active generation framework for synthesizing hard examples, and MI-MAE, a self-supervised method leveraging mutual information for masked image modeling. Together, these advancements form a cohesive framework for efficient deep learning, addressing computational, data, and architectural challenges to push the boundaries of scalable and practical machine learning systems.
See less
See moreDeep learning has revolutionized numerous fields, but its success is often hindered by computational inefficiency, reliance on vast labeled datasets, and challenges in designing optimal architectures. This thesis addresses these issues through contributions in four key areas: handcrafted efficient architecture design, automatic neural architecture evolution, effective knowledge distillation, and data-efficient training. First, we propose LightViT, a lightweight vision transformer, and LocalMamba, a visual state-space model, to advance handcrafted architecture design by balancing accuracy and efficiency. Second, we introduce GreedyNASv2, a method to optimize neural architecture search (NAS), and DyRep, a dynamic re-parameterization framework for evolving architectures during training. Third, our work on knowledge distillation includes DIST for improving logits-based distillation, MasKD for feature-level distillation via adaptive masks, and DiffKD, which unifies logit and feature distillation using diffusion models. Lastly, we tackle the challenge of data efficiency with ActGen, an active generation framework for synthesizing hard examples, and MI-MAE, a self-supervised method leveraging mutual information for masked image modeling. Together, these advancements form a cohesive framework for efficient deep learning, addressing computational, data, and architectural challenges to push the boundaries of scalable and practical machine learning systems.
See less
Date
2025Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Computer ScienceAwarding institution
The University of SydneyShare