Show simple item record

FieldValueLanguage
dc.contributor.authorSodsong, Wasuwee
dc.date.accessioned2018-03-16
dc.date.available2018-03-16
dc.date.issued2017-08-24
dc.identifier.urihttp://hdl.handle.net/2123/17987
dc.descriptionA Dissertation submitted to the Department of Computer Science and the Graduate School of Yonsei University, and the School of Information Technologies at The University of Sydney (Cotutelle), in partial fulfillment of the requirements for the degree of Doctor of Philosophyen_AU
dc.description.abstractIn the past decade, graphics processing units (GPUs) have gained wide-spread use as general purpose hardware accelerators. Equipped with several thousand cores, GPUs are suitable for data-intensive operations. Although a GPU provides a vast amount of raw, parallel compute power, it is nevertheless a daunting task to fully utilize this hardware resource. Doing so requires an in-depth understanding of the GPU architecture to utilize the software-exposed GPU memory hierarchy, and to mitigate main memory latencies. Because a GPU lacks complex control units, it under-performs for tasks with complex control flow. Control-flow intensive operations are thus more efficiently computed on a CPU. In contrast, CPUs lack ALUs and thus under-perform with data-intensive operations. In practice, we find applications to be composed of a mix of data-intensive operations and operations with complex control-flow. Heterogeneous computing aims at utilizing both the CPU and the GPU of a system. It offers the advantage of leveraging the key strengths of both architectures, while diminishing their weaknesses. This thesis proposes code-partitioning, which considers application characteristics and the capabilities of the underlying hardware to assign computations to either the CPU or the GPU. Dynamic scheduling techniques are proposed to leverage pipeline-parallelism and load-balance the work-load on a heterogeneous architecture. The proposed code-partitioning technique is applied with two major applications, JPEG decompression and Kronecker algebra-operations. The entropy decoding of JPEG decompression is difficult to parallelize because codewords are of variable length, and the start-position of a codeword in the bitstream is not known before the previous codeword has been decoded. The remaining JPEG decoding steps are compute-intensive with few dependencies. Similarly, Kronecker algebra, which has been shown to be effective with static program analysis, consists of data-intensive matrix operations. However, it has cross-iteration dependencies, such as bookkeeping of visited nodes, which is unsuitable for GPU computing. Despite improvement potential with a heterogeneous system, the domination of the JPEG format and the usefulness of Kronecker algebra, no approaches exist yet that are capable of joining forces of a system’s CPU and GPU. We investigate parallelization strategies that use heterogeneous multicores for JPEG decompression and Kronecker algebra. We propose algorithm-specific optimizations that minimize the known sequential bottlenecks. Our code-partitioning and scheduling scheme exploits task, data, and pipeline parallelism. We introduce an offline profiling step to determine the performance of a system’s CPU and GPU such that workloads are distributed accordingly. These applications are evaluated on several heterogeneous platforms, including an embedded system (for JPEG decompression). From the “lessons learned”, parallel software design patterns for heterogeneous computing have been distilled and put to work with the two major applications of this thesis.en_AU
dc.rightsThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en_AU
dc.subjectHeterogenous multicoresen_AU
dc.subjectGPU programmingen_AU
dc.subjectJPEG decodingen_AU
dc.subjectKronecker algebraen_AU
dc.titleParallelization Techniques for Heterogeneous Multicores with Applicationsen_AU
dc.typeThesisen_AU
dc.type.thesisDoctor of Philosophyen_AU
usyd.facultyFaculty of Engineering and Information Technologies, School of Information Technologiesen_AU
usyd.degreeDoctor of Philosophy Ph.D.en_AU
usyd.awardinginstThe University of Sydneyen_AU
usyd.awardinginstYonsei University Koreaen_AU


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.