Model-Adaptive Component Designs for Efficient Deep Neural Network Training
| Field | Value | Language |
| dc.contributor.author | Zheng, Mengyu | |
| dc.date.accessioned | 2026-01-28T03:13:21Z | |
| dc.date.available | 2026-01-28T03:13:21Z | |
| dc.date.issued | 2025 | en |
| dc.identifier.uri | https://hdl.handle.net/2123/34775 | |
| dc.description.abstract | Although some studies have been devoted to improving model training efficiency, they rarely perform adaptive customization of such mechanisms in conjunction with model. As is well known, different components and modalities often require distinct learning rate schedules or optimization algorithms to resolve compatibility issues during training. This naturally raises an important question: should training efficiency strategies also be adapted to the interactions of model components? We take a step further by exploring component-collaborative training efficiency strategies. Our study begins by investigating the integration of adaptive components within vision-language models (VLMs), aiming to enhance training efficiency by explicitly exploiting the unique cross-modal interactions. First, we introduce a re-weighting mechanism. The use of dual teachers ensures that each sample receives appropriate learning guidance. Moreover, we propose a novel graph-based multimodal proximity distillation approach. Second, we design a multimodal Mixture-of-Experts that jointly exploits image and text inputs. For each task, only a subset of experts needs to be updated, significantly reducing training time and GPU burden and thereby improving overall efficiency. For unimodal models, our study focuses on lightweight, input/output-level adaptive components that are decoupled from the backbone. Third, we investigate the tokenizer. We propose a simple yet effective tokenizer learning method tailored to each LLM. Customizing a tokenizer for each model incurs much lower computational costs, significantly reducing training costs. Last, we explore visual prompts as a lightweight and efficient adaptation strategy that only modifies the input layer. We propose an Optimal Transport-based Label Mapping strategy to minimize distribution shifts and reduce the adaptation burden on visual prompts. | en |
| dc.language.iso | en | en |
| dc.subject | Deep Learning | en |
| dc.subject | Training Efficiency | en |
| dc.title | Model-Adaptive Component Designs for Efficient Deep Neural Network Training | en |
| dc.type | Thesis | |
| dc.type.thesis | Doctor of Philosophy | en |
| dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en |
| usyd.faculty | SeS faculties schools::Faculty of Engineering::School of Computer Science | en |
| usyd.degree | Doctor of Philosophy Ph.D. | en |
| usyd.awardinginst | The University of Sydney | en |
| usyd.advisor | Xu, Chang | |
| usyd.include.pub | No | en |
Associated file/s
Associated collections