MonoDNN: A Deep Learning Compiler for Ultra-Fast Inference with Minimal Runtime Overhead via Holistic Code Optimization and Generation

Zhuang, Donglin

Access status:

USyd Access

Field	Value	Language
dc.contributor.author	Zhuang, Donglin
dc.date.accessioned	2022-06-21T05:55:13Z
dc.date.available	2022-06-21T05:55:13Z
dc.date.issued	2022	en_AU
dc.identifier.uri	https://hdl.handle.net/2123/28876
dc.description.abstract	Machine learning has been widely used in various application domains such as recommendation, computer vision, natural language processing, etc. The procedure that uses a trained model to perform prediction on unseen data, namely, deep learning inference is widely accelerated by dedicated SIMD/SIMT accelerators, such as GPUs. The existing deep learning inference frameworks utilize dedicated accelerators in a continuously host-instruct-device fashion. Specifically, for each operator in a computation graph, the deep learning framework runtime launches computation kernels from the host system, delegating computation task to a dedicated device accelerator. However, these deep learning frameworks’ runtime often introduces significant execution overhead, especially in the model inference procedure, and can cause severe underutilization of expensive deep learning accelerators. In this thesis, I defy the conventional inference designs by re-prioritizing the superiority of the pure device-side solution in deep learning inference scenarios. In this way, the host only needs to initialize the device computation once for each inference, whereas all remaining computation logics of the entire neural network are performed purely on device. The proposed pure device solution can execute inference workloads with near-zero deep-learning framework runtime overhead involved. In addition, under the new scheme, many new optimization opportunities can be explored, which otherwise seem impossible to be exploited in the current continuous host-instruct-device, multi-kernel launching inference schemes. Furthermore, I propose and build a new deep learning compiler to perform automatic optimization space exploration and device-side code generation for the given arbitrary input model. Finally, I empirically demonstrate that the compiler, namely MonoDNN can generate high-performance code that performs up to 2.49X faster than the existing state-of-the-art solutions with 1.91X on average across a wide range of prevalent deep learning models.	en_AU
dc.subject	Deep Learning	en_AU
dc.subject	Compilers	en_AU
dc.subject	Machine Learning Systems	en_AU
dc.title	MonoDNN: A Deep Learning Compiler for Ultra-Fast Inference with Minimal Runtime Overhead via Holistic Code Optimization and Generation	en_AU
dc.type	Thesis
dc.type.thesis	Masters by Research	en_AU
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en_AU
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Computer Science	en_AU
usyd.degree	Master of Philosophy M.Phil	en_AU
usyd.awardinginst	The University of Sydney	en_AU