Understanding and Improving the Performance of Read Operations Across the Storage Stack

Borge Chavez, Maria Fernanda

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Borge Chavez, Maria Fernanda
dc.date.accessioned	2020-01-14
dc.date.available	2020-01-14
dc.date.issued	2019-12-01
dc.identifier.uri	https://hdl.handle.net/2123/21671
dc.description.abstract	We live in a data-driven era, large amounts of data are generated and collected every day. Storage systems are the backbone of this era, as they store and retrieve data. To cope with increasing data demands (e.g., diversity, scalability), storage systems are experiencing changes across the stack. As other computer systems, storage systems rely on layering and modularity, to allow rapid development. Unfortunately, this can hinder performance clarity and introduce degradations (e.g., tail latency), due to unexpected interactions between components of the stack. In this thesis, we first perform a study to understand the behavior across different layers of the storage stack. We focus on sequential read workloads, a common I/O pattern in distributed le systems (e.g., HDFS, GFS). We analyze the interaction between read workloads, local le systems (i.e., ext4), and storage media (i.e., SSDs). We perform the same experiment over different periods of time (e.g., le lifetime). We uncover 3 slowdowns, all of which occur in the lower layers. When combined, these slowdowns can degrade throughput by 30%. We find that increased parallelism on the local le system mitigates these slowdowns, showing the need for adaptability in storage stacks. Given the fact that performance instabilities can occur at any layer of the stack, it is important that upper-layer systems are able to react. We propose smart hedging, a novel technique to manage high-percentile (tail) latency variations in read operations. Smart hedging considers production challenges, such as massive scalability, heterogeneity, and ease of deployment and maintainability. Our technique establishes a dynamic threshold by tracking latencies on the client-side. If a read operation exceeds the threshold, a new hedged request is issued, in an exponential back-off manner. We implement our technique in HDFS and evaluate it on 70k servers in 3 datacenters. Our technique reduces average tail latency, without generating excessive system load.	en_AU
dc.rights	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en_AU
dc.subject	performance	en_AU
dc.subject	storage	en_AU
dc.subject	computer systems	en_AU
dc.title	Understanding and Improving the Performance of Read Operations Across the Storage Stack	en_AU
dc.type	Thesis	en_AU
dc.type.thesis	Masters by Research	en_AU
usyd.faculty	Faculty of Engineering, School of Computer Science	en_AU
usyd.degree	Master of Philosophy M.Phil	en_AU
usyd.awardinginst	The University of Sydney	en_AU