Show simple item record

FieldValueLanguage
dc.contributor.authorZhou, Wenjie
dc.date.accessioned2026-01-15T03:01:26Z
dc.date.available2026-01-15T03:01:26Z
dc.date.issued2025en
dc.identifier.urihttps://hdl.handle.net/2123/34707
dc.descriptionIncludes publication
dc.description.abstractPerformance is crucial for the evolution of DNNs, particularly as computational requirements are surging. Along with increasing DNN model scale, the training cost is becoming a new problem for DNNs. One of the critical techniques for energy-efficient training is low-precision arithmetic. Block arithmetic is a promising technique that reduces precision requirements and power consumption. This method further reduces the word length of the element, and the shared exponent expands their dynamic range. This thesis aims to develop an improved block arithmetic algorithm and implementation methodology. At the arithmetic level, this work investigates the implementation of block arithmetic. At the GEMM kernel level, this dissertation examines kernel design under different block arithmetic implementations. For rescaling, this work further addresses the challenges associated with block arithmetic and introduces the proposed delayed scaling method called the delay update. At the application level, this work utilizes N-BEATS based inference and training accelerators to demonstrate the advantages of block arithmetic. The contributions of this work are as follows: Firstly, we propose the block minifloat (BM) implementation for inference, the first implementation of an FPGA based accelerator using BM arithmetic during publication, demonstrating hardware efficiency and accuracy benefits over integer and floating point on N-BEATS. Secondly, we propose the BM implementation for training, in the form of the first FPGA implementation of a 4-bit BM, mixed-precision neural network training of N-BEATS. Thirdly, we propose the delay update method to reduce the rescaling computation in block arithmetic. Empirical studies show that the delay update scheme achieves nearly the same accuracy as the commonly used maximum calibration method, with a significant hardware implementation advantage.en
dc.language.isoenen
dc.subjectblock arithmeticen
dc.subjectneural network trainingen
dc.subjectFPGAen
dc.subjectlow-precisionen
dc.subjectmicroscalingen
dc.subjectblock minifloaten
dc.titleBlock Arithmetic Techniques for the Implementation of Deep Neural Networksen
dc.typeThesis
dc.type.thesisDoctor of Philosophyen
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en
usyd.facultySeS faculties schools::Faculty of Engineeringen
usyd.degreeDoctor of Philosophy Ph.D.en
usyd.awardinginstThe University of Sydneyen
usyd.advisorLeong, Philip
usyd.include.pubYesen


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.