Low Latency and Scalable Machine Learning on FPGA-based System-on-Chip

Lou, Binglei

Permalink

Access status:

Open Access

Type

Thesis

Thesis type

Doctor of Philosophy

Author/s

Lou, Binglei

Abstract

Machine learning (ML) is highly effective for data analysis, decision-making, and solving complex problems, especially when explicit mathematical models are difficult to derive. Field-Programmable Gate Arrays (FPGAs) provide a powerful platform for ML tasks at the edge, where low ...
See moreMachine learning (ML) is highly effective for data analysis, decision-making, and solving complex problems, especially when explicit mathematical models are difficult to derive. Field-Programmable Gate Arrays (FPGAs) provide a powerful platform for ML tasks at the edge, where low latency and real-time responsiveness are essential. However, deploying ML on customized FPGA hardware involves balancing latency, accuracy, and flexibility under resource constraints. This thesis focuses on three objectives: (1) enhancing ML accelerator accuracy in an area-efficient manner, (2) integrating these accelerators into unified system-on-chip (SoC) architectures, and (3) developing reconfigurable blocks to adapt to changing environments. At the circuit level, it introduces LUTEnsemble, a specialized LUT-based architecture for fast, scalable DNN inference. By integrating sparsely connected PolyLUT sub-neurons with adder tree structures, LUTEnsemble mitigates the exponential resource scaling of traditional LUT-based DNNs, achieving superior accuracy, latency, and resource efficiency. At the system level, the thesis applies FPGA-based neural networks to qubit state measurements in trapped-ion quantum processing. Using LUTEnsemble and Vision Transformer (ViT) architectures, it achieves low latency and high accuracy. Optimized interfacing with an EMCCD camera reduced detection latency by 119 times for single-qubit tests and 94 times for three-qubit tests compared to a GPU baseline. Finally, the thesis proposes a flexible FPGA framework for anomaly detection (fSEAD). It uses partially reconfigurable blocks (pblocks) connected via an AXI switch, supporting dynamic composition of ensemble results. Experiments on the PYNQ platform showed 3-8 times speed-ups over CPU implementations across four datasets. This work advances FPGA-based ML design across circuit, system, and tool levels, providing innovative solutions for real-time, resource-efficient, and reconfigurable edge applications.
See less

Date

2024

Rights statement

The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.

Faculty/School

Faculty of Engineering, School of Electrical and Information Engineering

Awarding institution

The University of Sydney

Subjects

FPGA
Machine Learning
Neural Networks