Application profiling and resource management for MapReduce
Access status:
Open Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Lu, PengAbstract
Scale of data generated and processed is exponential growth in the Big Data ear. It poses a challenge that is far beyond the goal of a single computing system. Processing such vast amount of data on a single machine is impracticable in term of time or cost. Hence, distributed ...
See moreScale of data generated and processed is exponential growth in the Big Data ear. It poses a challenge that is far beyond the goal of a single computing system. Processing such vast amount of data on a single machine is impracticable in term of time or cost. Hence, distributed systems, which can harness very large clusters of commodity computers and processing data within restrictive time deadlines, are imperative. In this thesis, we target two aspects of distributed systems: application profiling and resource management. We study a MapReduce system in detail, which is a programming paradigm for large scale distributed computing, and presents solutions to tackle three key problems. Firstly, this thesis analyzes the characteristics of jobs running on the MapReduce system to reveal the problem—the Application scope of MapReduce has been extended beyond the original design goal that was large-scale data processing. This problem enables us to present a Workload Characteristic Oriented Scheduler (WCO), which strives for co-locating tasks of possibly different MapReduce jobs with complementing resource usage characteristics. Secondly, this thesis studies the current job priority mechanism focusing on resource management. In the MapReduce system, job priority only exists at scheduling level. High priority jobs are placed at the front of the scheduling queue and dispatched first. Resource, however, is fairly shared among jobs running at the same worker node without any consideration for their priorities. In order to resolve this, this thesis presents a non-intrusive slot layering solution, which dynamically allocates resource between running jobs based on their priority and efficiently reduces the execution time of high priority jobs while improves overall throughput. Last, based on the fact of underutilization of resource at each individual worker node, this thesis propose a new way, Local Resource Shaper (LRS), to smooth resource consumption of each individual job by automatically tuning the execution of concurrent jobs to maximize resource utilization while minimizing resource contention.
See less
See moreScale of data generated and processed is exponential growth in the Big Data ear. It poses a challenge that is far beyond the goal of a single computing system. Processing such vast amount of data on a single machine is impracticable in term of time or cost. Hence, distributed systems, which can harness very large clusters of commodity computers and processing data within restrictive time deadlines, are imperative. In this thesis, we target two aspects of distributed systems: application profiling and resource management. We study a MapReduce system in detail, which is a programming paradigm for large scale distributed computing, and presents solutions to tackle three key problems. Firstly, this thesis analyzes the characteristics of jobs running on the MapReduce system to reveal the problem—the Application scope of MapReduce has been extended beyond the original design goal that was large-scale data processing. This problem enables us to present a Workload Characteristic Oriented Scheduler (WCO), which strives for co-locating tasks of possibly different MapReduce jobs with complementing resource usage characteristics. Secondly, this thesis studies the current job priority mechanism focusing on resource management. In the MapReduce system, job priority only exists at scheduling level. High priority jobs are placed at the front of the scheduling queue and dispatched first. Resource, however, is fairly shared among jobs running at the same worker node without any consideration for their priorities. In order to resolve this, this thesis presents a non-intrusive slot layering solution, which dynamically allocates resource between running jobs based on their priority and efficiently reduces the execution time of high priority jobs while improves overall throughput. Last, based on the fact of underutilization of resource at each individual worker node, this thesis propose a new way, Local Resource Shaper (LRS), to smooth resource consumption of each individual job by automatically tuning the execution of concurrent jobs to maximize resource utilization while minimizing resource contention.
See less
Date
2015-04-22Faculty/School
Faculty of Engineering and Information Technologies, School of Information TechnologiesAwarding institution
The University of SydneyShare