Simplification Of Deep Neural Networks For Efficient Inference

Faraone, Julian

Permalink

Access status:

Open Access

Type

Thesis

Thesis type

Doctor of Philosophy

Author/s

Faraone, Julian

Abstract

In recent years, Deep Neural Networks (DNNs) have become an area of high interest due to it's ground-breaking results in many fields and applications. In many of these applications however, the model's runtime and memory cost of computing inference is more important than the cost ...
See moreIn recent years, Deep Neural Networks (DNNs) have become an area of high interest due to it's ground-breaking results in many fields and applications. In many of these applications however, the model's runtime and memory cost of computing inference is more important than the cost of training the model. Inference is computationally expensive, making them difficult to deploy in constrained hardware environments. This has lead to an increasing interest in recent years for model compression techniques for these models. In this thesis, model compression techniques are presented for achieving efficient representations of DNNs for hardware acceleration. Firstly, a weight pruning technique to achieve unstructured sparse representations of bitwise DNNs is explored on the MNIST and CIFAR10 datasets. Accompanying this, is a hardware exploration of the resulting representations. Secondly, a hardware-aware filter pruning technique to achieve structured sparse representations of bitwise DNNs is investigated on the ImageNet dataset and hardware performance improvements are evaluated via a Field Programmable Gate Array (FPGA) implementation. Thirdly, a quantization method is introduced for training highly accurate bitwise networks with high computational efficiency on the ImageNet dataset. A hardware architecture is designed for this representation and its performance evaluated via FPGA simulations. Lastly, a custom arithmetic is designed which utilizes FPGA-optimized multipliers. Additionally, a training methodology is presented which is customized for DNN models to be compatible with the multiplier. Together, this work illustrates the effectiveness of designing DNNs with hardware in mind. Adjunctly, designing customized hardware helps in optimizing accuracy and hardware efficiency. This is very useful for many real-world DNN applications where hardware performance is paramount.
See less

Date

2021

Rights statement

The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.

Faculty/School

Faculty of Engineering, School of Electrical and Information Engineering

Awarding institution

The University of Sydney

Subjects

quantisation
pruning
compression
deep neural networks
machine learning
hardware