Data analytics is at the foundation of both high-quality products and services in modern economies and societies. Big data workloads run on complex large-scale computing clusters, which implies significant challenges for deeply understanding and characterizing overall system performance. In general, performance is affected by many factors at multiple layers in the system stack, hence it is challenging to identify the key metrics when understanding big data workload performance. In this paper, we propose a novel workload characterization methodology using ensemble learning, called Metric Importance Analysis (MIA), to quantify the respective importance of workload metrics. By focusing on the most important metrics, MIA reduces the complexity of the analysis without losing information. Moreover, we develop the MIA-based Kiviat Plot (MKP) and Benchmark Similarity Matrix (BSM) which provide more insightful information than the traditional linkage clustering based dendrogram to visualize program behavior (dis)similarity. To demonstrate the applicability of MIA, we use it to characterize three big data benchmark suites: HiBench, CloudRank-D and SZTS. The results show that MIA is able to characterize complex big data workloads in a simple, intuitive manner, and reveal interesting insights. Moreover, through a case study, we demonstrate that tuning the configuration parameters related to the important metrics found by MIA results in higher performance improvements than through tuning the parameters related to the less important ones.
To View the Base Paper Abstract Contents
Now it is Your Time to Shine.
Great careers Start Here.
We Guide you to Every Step
Success! You're Awesome
Thank you for filling out your information!
We’ve sent you an email with your Final Year Project PPT file download link at the email address you provided. Please enjoy, and let us know if there’s anything else we can help you with.
To know more details Call 900 31 31 555
The WISEN Team