Neural Architecture Design and Compression for Efficient Vision Perception
Access status:
Open Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Guo, JianyuanAbstract
In recent years, intelligent systems have evolved significantly, transforming daily life. To maximize their impact, efficient deployment on edge devices—smartphones, smartwatches, robots, and autonomous vehicles—is essential. Deep neural networks, foundational in computer vision, ...
See moreIn recent years, intelligent systems have evolved significantly, transforming daily life. To maximize their impact, efficient deployment on edge devices—smartphones, smartwatches, robots, and autonomous vehicles—is essential. Deep neural networks, foundational in computer vision, offer powerful feature encoding but demand substantial computational resources, leading to high energy consumption and carbon footprint. This thesis focuses on developing compact yet high-precision deep learning models that balance performance and efficiency. It explores efficient vision backbones and compression techniques to support various tasks while ensuring deployability. We propose a hybrid architecture integrating transformers for global dependencies and CNNs for local feature extraction, replacing traditional components with fully connected layers to enhance efficiency. This design reduces complexity while maintaining accuracy. We further investigate training a unified model for multiple vision tasks through a data-efficient strategy, enabling the model to handle both high- and low-level tasks. Extending to multi-modal learning, we introduce an efficient fusion framework to enhance AI perception in real-world applications. Additionally, we refine knowledge distillation for compact models, reassessing existing methods to improve real-world applicability. Specifically, for object detection, we highlight the overlooked role of background information and propose a decoupled distillation method that enhances performance. This thesis presents practical solutions for lightweight neural networks, enabling AI deployment in resource-constrained environments. By optimizing deep learning models for efficiency, it contributes to the accessibility and sustainability of AI across various domains.
See less
See moreIn recent years, intelligent systems have evolved significantly, transforming daily life. To maximize their impact, efficient deployment on edge devices—smartphones, smartwatches, robots, and autonomous vehicles—is essential. Deep neural networks, foundational in computer vision, offer powerful feature encoding but demand substantial computational resources, leading to high energy consumption and carbon footprint. This thesis focuses on developing compact yet high-precision deep learning models that balance performance and efficiency. It explores efficient vision backbones and compression techniques to support various tasks while ensuring deployability. We propose a hybrid architecture integrating transformers for global dependencies and CNNs for local feature extraction, replacing traditional components with fully connected layers to enhance efficiency. This design reduces complexity while maintaining accuracy. We further investigate training a unified model for multiple vision tasks through a data-efficient strategy, enabling the model to handle both high- and low-level tasks. Extending to multi-modal learning, we introduce an efficient fusion framework to enhance AI perception in real-world applications. Additionally, we refine knowledge distillation for compact models, reassessing existing methods to improve real-world applicability. Specifically, for object detection, we highlight the overlooked role of background information and propose a decoupled distillation method that enhances performance. This thesis presents practical solutions for lightweight neural networks, enabling AI deployment in resource-constrained environments. By optimizing deep learning models for efficiency, it contributes to the accessibility and sustainability of AI across various domains.
See less
Date
2025Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Computer ScienceAwarding institution
The University of SydneyShare