Optimal resource provisioning is essential for scalable big data analytics. However, it has been difficult to accurately forecast the resource requirements before the actual deployment of these applications as their resource requirements are heavily application and data dependent. This paper identifies the existence of effective memory resource requirements for most of the big data analytic applications running inside JVMs in distributed Spark environments. Provisioning memory less than the effective memory requirement may result in rapid deterioration of the application execution in terms of its total execution time. A machine learning-based prediction model is proposed in this paper to forecast the effective memory requirement of an application given its service level agreement. This model captures the memory consumption behavior of big data applications and the dynamics of memory utilization in a distributed cluster environment. With an accurate prediction of the effective memory requirement, it is shown that up to 60 percent savings of the memory resource is feasible if an execution time penalty of 10 percent is acceptable. The accuracy of the model is evaluated on a physical Spark cluster with 128 cores and 1TB of total memory. The experiment results show that the proposed solution can predict the minimum required memory size for given acceptable delays with high accuracy, even if the behavior of target applications is unknown during the training of the model.
To View the Base Paper Abstract Contents
Now it is Your Time to Shine.
Great careers Start Here.
We Guide you to Every Step
Success! You're Awesome
Thank you for filling out your information!
We’ve sent you an email with your Final Year Project PPT file download link at the email address you provided. Please enjoy, and let us know if there’s anything else we can help you with.
To know more details Call 900 31 31 555
The WISEN Team