Abstract

Big Data Analytics is an innovative approach for extracting the data from a huge volume of data warehouse systems. It reveals the method to compress the high volume of data into clusters by MapReduce and HDFS. However, the data processing has taken more time for extract and store in Hadoop clusters. The proposed system deals with the challenges of time delay in shuffle phase of map-reduce due to scheduling and sequencing. For improving the speed of big data, this proposed work using the Compressed Elastic Search Index (CESI) and MapReduce-Based Next Generation Sequencing Approach (MRBNGSA). This approach helps to increase the speed of data retrieval from HDFS clusters because of the way it is stored in that. this method is stored only the metadata in HDFS which takes less memory during runtime compare to big data due to the volume of data stored in HDFS. This approach is reduces the CPU utilization and memory allocation of the resource manager in Hadoop Framework and imroves data processing speed, such a way that time delay has to be reduced with minimum latency.

Highlights

  • In this era big data makes a new revolution in day to day activities of social media, health care, banking sector, Military division, and industries

  • Hadoop 2.51 version has been installed in this system setup and Ubuntu Linux with kernel 2.6.24 operating system is in every computer of the cluster

  • Overall performance and improvement of this approach are to reduce the time taken of completed failure jobs because of node failure will be increased with respect to speed. This approach is used to find the solution for latency and throughput issues in the map-reduce concept of the Hadoop Framework in Big data Analytics

Read more

Summary

Introduction

In this era big data makes a new revolution in day to day activities of social media, health care, banking sector, Military division, and industries. Because a day’s data generated by humans as well as machines controlling is not an easy job with old techniques. The place where it has a lot of formats deals with a major problem. The issues are multiple passes and real-time data integration is not possible in map-reduce for data processing using old methods. In Hadoop, so many clients are sending their jobs for performing tasks This can be handled by Job Tracker or resource Manager by Hadoop. If Hadoop 1.X used in the cluster, the tasks can be controlled by the Job Tracker /Resource Manager. If it will be Hadoop 2.X, it may use the secondary name node which is the replica of the Name Node and will be used for copying Metadata from the cluster.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call