Design Efficient Deep Neural Networks with System Optimization

Zhang, Zao

Permalink

Access status:

Open Access

Type

Thesis

Thesis type

Doctor of Philosophy

Author/s

Zhang, Zao

Abstract

The pursuit of enhanced accuracy in Deep Neural Networks (DNNs) has led to increasingly complex model structures, notably in Convolutional Neural Networks (CNNs) and Transformers. While these advancements have propelled the capabilities of intelligent applications, they also introduce ...
See moreThe pursuit of enhanced accuracy in Deep Neural Networks (DNNs) has led to increasingly complex model structures, notably in Convolutional Neural Networks (CNNs) and Transformers. While these advancements have propelled the capabilities of intelligent applications, they also introduce significant challenges, primarily an increase in inference latency.This issue is particularly critical in time-sensitive applications, such as self-driving vehicles, where delays could have severe consequences. In addressing these challenges, this dissertation focuses on optimizing DNNs for efficiency with the goal of maintaining or minimally impacting their accuracy. The study is structured into five chapters, each targeting a specific aspect of DNN optimization in the context of CNNs and Transformers. (1) Efficient Model Design for CNNs (2) System Optimization for CNNs (3) Efficient Model Design for Transformers (4) System Optimization for Transformers (5) Model Compression Methods. Throughout those studies, the emphasis is placed not only on the technical advancements in DNN efficiency but also on the broader implications of these improvements. The research highlights how optimizing DNNs can lead to significant benefits in real-world applications, particularly those requiring real-time processing and operating under resource constraints. By advancing the field of DNN efficiency, this work contributes to the development of more sustainable, accessible, and powerful AI technologies, reinforcing the role of DNNs in the future of intelligent systems.
See less

Date

2024

Rights statement

The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.

Faculty/School

Faculty of Engineering, School of Electrical and Information Engineering

Awarding institution

The University of Sydney

Subjects

Efficient Deep Neural Networks
AI inference acceleration
model compression
system optimization