Show simple item record

FieldValueLanguage
dc.contributor.authorWu, Xiaoxiang
dc.date.accessioned2026-03-30T01:52:01Z
dc.date.available2026-03-30T01:52:01Z
dc.date.issued2026en
dc.identifier.urihttps://hdl.handle.net/2123/35044
dc.description.abstractChapter 2 studies persistent key-value stores and isolate the impact of individual design techniques within a unified code base. Unlike prior works that evaluate complete systems, our methodology enables an apples-to-apples comparison of trade-offs. We show that random allocation achieves performance comparable to log-structured persistence while avoiding garbage-collection latency spikes, that persistent CPU caches, such as Extended Asynchronous DRAM Refresh or Compute Express Link global flush, often hinder rather than help performance, necessitating explicit flushes, and that recovery mechanisms require careful handling of allocator metadata, with transactions imposing nontrivial overhead. Chapter 3 introduces the concept of software pre-storing, the converse of prefetching, which issues instructions to proactively move data down the memory hierarchy. Implemented via existing CPU instructions, pre-storing benefits write-intensive workloads, especially on architectures with heterogeneous memories such as PMem or CXL-attached DRAM. We develop DirtBuster, a tool that identifies applications and code regions where pre-storing is beneficial. Evaluations on ARM and x86 systems with PMem and cache-coherent DRAM demonstrate performance improvements of up to 2.3× across key-value stores, HPC applications, message-passing systems, and TensorFlow. Chapter 4 examines unified memory architectures that combine high-bandwidth access with a coherent, shared address space, thereby addressing the limitations of conventional iGPU (bandwidth-bound) and dGPU (PCIe-bound) designs. Using a state-of-the-art unified memory architecture platform, we characterize performance under diverse workloads, identify scenarios where unified memory architectures excels, and reveal the costs of fully shared memory. Our analysis provides practical guidelines for memory management in unified memory architectures systems and highlights their significant potential for balanced CPU–GPU workloads.en
dc.language.isoenen
dc.subjectHeterogeneous systemsen
dc.subjectPersistent memoryen
dc.subjectUnified memoryen
dc.subjectCPU cachesen
dc.subjectPre-storeen
dc.subjectPre-fetchen
dc.titleOptimizing the Use of Heterogeneous Memoryen
dc.typeThesis
dc.type.thesisDoctor of Philosophyen
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en
usyd.facultySeS faculties schools::Faculty of Engineering::School of Computer Scienceen
usyd.degreeDoctor of Philosophy Ph.D.en
usyd.awardinginstThe University of Sydneyen
usyd.advisorZwaenepoel, Willy
usyd.include.pubNoen


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.