Show simple item record

A profiling and performance analysis based self-tuning system for optimization of Hadoop MapReduce cluster configuration

dc.creatorWu, Dili
dc.date.accessioned2020-08-22T00:22:34Z
dc.date.available2014-04-17
dc.date.issued2013-04-17
dc.identifier.urihttps://etd.library.vanderbilt.edu/etd-04022013-104900
dc.identifier.urihttp://hdl.handle.net/1803/11941
dc.description.abstractAs a parallel data processing framework, MapReduce has been proved to be one of the most popular topics in the age of Cloud Computing since it was firstly proposed by Google. Despite its advantages such as scalability, reliability and flexibility, how to manage the resources of a MapReduce cluster and thus optimize the performance of MapReduce applications running on it is still one major issue in this field. This thesis introduces PPABS, a profiling and performance analysis based system for performance optimization of a Hadoop cluster by automatically tuning its configuration settings. The entire process of PPABS can be described as follows. First of all, Profiling of MapReduce job performance and Data Mining technique were combined in this system to dynamically divide jobs into groups. Secondly, Simulated Annealing, a probabilistic metaheuristic algorithm for global optimization, was imported and modified to find the optimum solution and tune the cluster configuration for the job groups we got from the first step. Thirdly, after running an incoming job with only a small part of its input data set, Pattern Recognition technique was also used to classify this new job. And finally, the cluster configuration would be updated by PPABS to match this job's features before running the whole job. The experimental results were very promising and showed the effectiveness of our approach in improving the performance of several Hadoop jobs running with cluster built on Amazon EC2.
dc.format.mimetypeapplication/pdf
dc.subjectSelf-Tuning
dc.subjectPerformance Optimiazation
dc.subjectMapReduce
dc.subjectCluster Configuration
dc.subjectHadoop
dc.titleA profiling and performance analysis based self-tuning system for optimization of Hadoop MapReduce cluster configuration
dc.typethesis
dc.contributor.committeeMemberDr. Yi Cui
dc.type.materialtext
thesis.degree.nameMS
thesis.degree.levelthesis
thesis.degree.disciplineComputer Science
thesis.degree.grantorVanderbilt University
local.embargo.terms2014-04-17
local.embargo.lift2014-04-17
dc.contributor.committeeChairDr. Aniruddha Gokhale


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record