Integrated Transformers Inference Framework for Multiple Tenants on GPU
Field | Value | Language |
dc.contributor.author | Zhang, Yuning | |
dc.date.accessioned | 2023-08-28T07:16:24Z | |
dc.date.available | 2023-08-28T07:16:24Z | |
dc.date.issued | 2023 | en_AU |
dc.identifier.uri | https://hdl.handle.net/2123/31606 | |
dc.description.abstract | In recent years, Transformer models have gained prominence in the deep learning domain, serving as the foundation for a wide array of applications, including Natural Language Processing (NLP) and Computer Vision (CV). These models have become essential for numerous inference tasks, but their implementation often faces challenges related to GPU utilization and system throughput. Typically, current GPU-based inference frameworks treat each model individually, which results in suboptimal resource management and decreased performance. To address these limitations, we introduce ITIF: Integrated Transformers Inference Framework for multiple tenants with a shared backbone. ITIF allows multiple tenants to share a single backbone Transformer model on a single GPU, consolidating operators from various multi-tenant inference models. This approach significantly optimizes GPU utilization and system throughput. Our proposed framework, ITIF, marks a considerable advancement towards enhancing the efficiency of deep learning, particularly for large-scale cloud providers hosting numerous models with a shared backbone. In our experiments, we extensively evaluated the performance of ITIF in comparison with traditional baselines. We conducted tests on a variety of deep learning tasks, including NLP and CV tasks. We found that ITIF consistently outperformed the baselines, with improvements in performance by up to 2.40 times. In conclusion, our research highlights the potential benefits of adopting the ITIF framework for improving the efficiency and scalability of Transformer-based deep learning systems. By enabling multiple tenants to share a single backbone model, ITIF provides an innovative solution to address the challenges faced by large-scale cloud providers in optimizing GPU utilization and system throughput. As such, ITIF presents a promising direction for further research and development in the field of deep learning. | en_AU |
dc.language.iso | en | en_AU |
dc.subject | Transformer Inference | en_AU |
dc.subject | Multiple Tenants | en_AU |
dc.subject | Parallel Processing | en_AU |
dc.subject | GPU | en_AU |
dc.title | Integrated Transformers Inference Framework for Multiple Tenants on GPU | en_AU |
dc.type | Thesis | |
dc.type.thesis | Masters by Research | en_AU |
dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en_AU |
usyd.faculty | SeS faculties schools::Faculty of Engineering::School of Electrical and Information Engineering | en_AU |
usyd.degree | Master of Philosophy M.Phil | en_AU |
usyd.awardinginst | The University of Sydney | en_AU |
usyd.advisor | Yuan, Dong |
Associated file/s
Associated collections