Project centers in Chennai

IEEE Final Year Project Topic for CSE

Base Paper Title

Speed Up Big Data Analytics by Unveiling the Storage Distribution of Sub-datasets

Our Title

IEEE Project Abstract

In this paper, we study the problem of sub-dataset analysis over distributed file systems, e.g., the Hadoop file system. Our experiments show that the sub-datasets distribution over HDFS blocks, which is hidden by HDFS, can often cause corresponding analyses to suffer from a seriously imbalanced or inefficient parallel execution. Specifically, the content clustering of sub-datasets results in some computational nodes carrying out much more workload than others; furthermore, it leads to inefficient sampling of sub-datasets, as analysis programs will often read large amounts of irrelevant data. We conduct a comprehensive analysis on how imbalanced computing patterns and inefficient sampling occur. We then propose a storage distribution aware method to optimize sub-dataset analysis over distributed storage systems referred to as DataNet. First, we propose an efficient algorithm to obtain the meta-data of sub-dataset distributions. Second, we design an elastic storage structure called ElasticMap based on the HashMap and BloomFilter techniques to store the meta-data. Third, we employ distribution-aware algorithms for sub-dataset applications to achieve balanced and efficient parallel execution. Our proposed method can benefit different sub-dataset analyses with various computational requirements. Experiments are conducted on PRObEs Marmot 128-node cluster testbed and the results show the performance benefits of DataNet.In this paper, we study the problem of sub-dataset analysis over distributed file systems, e.g., the Hadoop file system. Our experiments show that the sub-datasets distribution over HDFS blocks, which is hidden by HDFS, can often cause corresponding analyses to suffer from a seriously imbalanced or inefficient parallel execution. Specifically, the content clustering of sub-datasets results in some computational nodes carrying out much more workload than others; furthermore, it leads to inefficient sampling of sub-datasets, as analysis programs will often read large amounts of irrelevant data. We conduct a comprehensive analysis on how imbalanced computing patterns and inefficient sampling occur. We then propose a storage distribution aware method to optimize sub-dataset analysis over distributed storage systems referred to as DataNet. First, we propose an efficient algorithm to obtain the meta-data of sub-dataset distributions. Second, we design an elastic storage structure called ElasticMap based on the HashMap and BloomFilter techniques to store the meta-data. Third, we employ distribution-aware algorithms for sub-dataset applications to achieve balanced and efficient parallel execution. Our proposed method can benefit different sub-dataset analyses with various computational requirements. Experiments are conducted on PRObEs Marmot 128-node cluster testbed and the results show the performance benefits of DataNet.

IEEE Project Existing System

IEEE Project Drawback of Existing System

IEEE Project Proposed System

IEEE Project Advantage of Proposed System

IEEE Project Enhancement from Base Paper

IEEE Project Hardware & Software

IEEE Project Algorithm

IEEE Project Overview

IEEE Project Efficiency

IEEE Project Literature Survey

To View the Abstract Contents

Or Enquire Now !!!, WISEN Project Specialist will contact you soon.

Exclusive
Offer
Refer Your Friend
10%
CASHBACK
Refer Another Friend
Thanks for Referring Your Friend / Relation

Now it is Your Time to Shine.

Great careers Start Here.

We Guide you to Every Step

Success! You're Awesome

Thank you for filling out your information!

We’ve sent you an email with your Final Year Project PPT file download link at the email address you provided. Please enjoy, and let us know if there’s anything else we can help you with.

To know more details Call 900 31 31 555

The WISEN Team