MonoDNN: A Deep Learning Compiler for Ultra-Fast Inference with Minimal Runtime Overhead via Holistic Code Optimization and Generation
Field | Value | Language |
dc.contributor.author | Zhuang, Donglin | |
dc.date.accessioned | 2022-06-21T05:55:13Z | |
dc.date.available | 2022-06-21T05:55:13Z | |
dc.date.issued | 2022 | en_AU |
dc.identifier.uri | https://hdl.handle.net/2123/28876 | |
dc.description.abstract | Machine learning has been widely used in various application domains such as recommendation, computer vision, natural language processing, etc. The procedure that uses a trained model to perform prediction on unseen data, namely, deep learning inference is widely accelerated by dedicated SIMD/SIMT accelerators, such as GPUs. The existing deep learning inference frameworks utilize dedicated accelerators in a continuously host-instruct-device fashion. Specifically, for each operator in a computation graph, the deep learning framework runtime launches computation kernels from the host system, delegating computation task to a dedicated device accelerator. However, these deep learning frameworks’ runtime often introduces significant execution overhead, especially in the model inference procedure, and can cause severe underutilization of expensive deep learning accelerators. In this thesis, I defy the conventional inference designs by re-prioritizing the superiority of the pure device-side solution in deep learning inference scenarios. In this way, the host only needs to initialize the device computation once for each inference, whereas all remaining computation logics of the entire neural network are performed purely on device. The proposed pure device solution can execute inference workloads with near-zero deep-learning framework runtime overhead involved. In addition, under the new scheme, many new optimization opportunities can be explored, which otherwise seem impossible to be exploited in the current continuous host-instruct-device, multi-kernel launching inference schemes. Furthermore, I propose and build a new deep learning compiler to perform automatic optimization space exploration and device-side code generation for the given arbitrary input model. Finally, I empirically demonstrate that the compiler, namely MonoDNN can generate high-performance code that performs up to 2.49X faster than the existing state-of-the-art solutions with 1.91X on average across a wide range of prevalent deep learning models. | en_AU |
dc.subject | Deep Learning | en_AU |
dc.subject | Compilers | en_AU |
dc.subject | Machine Learning Systems | en_AU |
dc.title | MonoDNN: A Deep Learning Compiler for Ultra-Fast Inference with Minimal Runtime Overhead via Holistic Code Optimization and Generation | en_AU |
dc.type | Thesis | |
dc.type.thesis | Masters by Research | en_AU |
dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en_AU |
usyd.faculty | SeS faculties schools::Faculty of Engineering::School of Computer Science | en_AU |
usyd.degree | Master of Philosophy M.Phil | en_AU |
usyd.awardinginst | The University of Sydney | en_AU |
Associated file/s
Associated collections