Show simple item record

FieldValueLanguage
dc.contributor.authorZhuang, Donglin
dc.date.accessioned2022-06-21T05:55:13Z
dc.date.available2022-06-21T05:55:13Z
dc.date.issued2022en_AU
dc.identifier.urihttps://hdl.handle.net/2123/28876
dc.description.abstractMachine learning has been widely used in various application domains such as recommendation, computer vision, natural language processing, etc. The procedure that uses a trained model to perform prediction on unseen data, namely, deep learning inference is widely accelerated by dedicated SIMD/SIMT accelerators, such as GPUs. The existing deep learning inference frameworks utilize dedicated accelerators in a continuously host-instruct-device fashion. Specifically, for each operator in a computation graph, the deep learning framework runtime launches computation kernels from the host system, delegating computation task to a dedicated device accelerator. However, these deep learning frameworks’ runtime often introduces significant execution overhead, especially in the model inference procedure, and can cause severe underutilization of expensive deep learning accelerators. In this thesis, I defy the conventional inference designs by re-prioritizing the superiority of the pure device-side solution in deep learning inference scenarios. In this way, the host only needs to initialize the device computation once for each inference, whereas all remaining computation logics of the entire neural network are performed purely on device. The proposed pure device solution can execute inference workloads with near-zero deep-learning framework runtime overhead involved. In addition, under the new scheme, many new optimization opportunities can be explored, which otherwise seem impossible to be exploited in the current continuous host-instruct-device, multi-kernel launching inference schemes. Furthermore, I propose and build a new deep learning compiler to perform automatic optimization space exploration and device-side code generation for the given arbitrary input model. Finally, I empirically demonstrate that the compiler, namely MonoDNN can generate high-performance code that performs up to 2.49X faster than the existing state-of-the-art solutions with 1.91X on average across a wide range of prevalent deep learning models.en_AU
dc.subjectDeep Learningen_AU
dc.subjectCompilersen_AU
dc.subjectMachine Learning Systemsen_AU
dc.titleMonoDNN: A Deep Learning Compiler for Ultra-Fast Inference with Minimal Runtime Overhead via Holistic Code Optimization and Generationen_AU
dc.typeThesis
dc.type.thesisMasters by Researchen_AU
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en_AU
usyd.facultySeS faculties schools::Faculty of Engineering::School of Computer Scienceen_AU
usyd.degreeMaster of Philosophy M.Philen_AU
usyd.awardinginstThe University of Sydneyen_AU


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.