An Approach of RDD Optimization in Big Data Analytics

V Athira,Jeena Thomas

doi:10.1088/1757-899x/396/1/012022

Abstract

We live in the information age and new information is produced in each and every second. The information can be worlds data which is responsible for major technological changes that can bring new ways in decision making. Keeping this data for further analysis and computation is a difficult task. Several studies are done in this arena, making it effective for future figuring. Processing or analysing such huge amount of data is a challenging task. All the existing technologies contain certain bottlenecks and performance overheads, diverse challenges like scalability. Spark is the commonly used data analysis framework. Map Reduce is a computing paradigm and a popular model for distributed data analysis. This paper gives a survey about some enormous information technologies, how it will deal with huge data, and the difficulties in existing advances, and has additionally learned about a portion of the execution bottlenecks and preventive techniques, and the concentration at that point moves to the Resilient Distributed Dataset (RDD), and how it is optimized.

Full Text