Optimizing the Use of Heterogeneous Memory

Wu, Xiaoxiang

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Wu, Xiaoxiang
dc.date.accessioned	2026-03-30T01:52:01Z
dc.date.available	2026-03-30T01:52:01Z
dc.date.issued	2026	en
dc.identifier.uri	https://hdl.handle.net/2123/35044
dc.description.abstract	Chapter 2 studies persistent key-value stores and isolate the impact of individual design techniques within a unified code base. Unlike prior works that evaluate complete systems, our methodology enables an apples-to-apples comparison of trade-offs. We show that random allocation achieves performance comparable to log-structured persistence while avoiding garbage-collection latency spikes, that persistent CPU caches, such as Extended Asynchronous DRAM Refresh or Compute Express Link global flush, often hinder rather than help performance, necessitating explicit flushes, and that recovery mechanisms require careful handling of allocator metadata, with transactions imposing nontrivial overhead. Chapter 3 introduces the concept of software pre-storing, the converse of prefetching, which issues instructions to proactively move data down the memory hierarchy. Implemented via existing CPU instructions, pre-storing benefits write-intensive workloads, especially on architectures with heterogeneous memories such as PMem or CXL-attached DRAM. We develop DirtBuster, a tool that identifies applications and code regions where pre-storing is beneficial. Evaluations on ARM and x86 systems with PMem and cache-coherent DRAM demonstrate performance improvements of up to 2.3× across key-value stores, HPC applications, message-passing systems, and TensorFlow. Chapter 4 examines unified memory architectures that combine high-bandwidth access with a coherent, shared address space, thereby addressing the limitations of conventional iGPU (bandwidth-bound) and dGPU (PCIe-bound) designs. Using a state-of-the-art unified memory architecture platform, we characterize performance under diverse workloads, identify scenarios where unified memory architectures excels, and reveal the costs of fully shared memory. Our analysis provides practical guidelines for memory management in unified memory architectures systems and highlights their significant potential for balanced CPU–GPU workloads.	en
dc.language.iso	en	en
dc.subject	Heterogeneous systems	en
dc.subject	Persistent memory	en
dc.subject	Unified memory	en
dc.subject	CPU caches	en
dc.subject	Pre-store	en
dc.subject	Pre-fetch	en
dc.title	Optimizing the Use of Heterogeneous Memory	en
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Computer Science	en
usyd.degree	Doctor of Philosophy Ph.D.	en
usyd.awardinginst	The University of Sydney	en
usyd.advisor	Zwaenepoel, Willy
usyd.include.pub	No	en