Abstract

We live in the information age and new information is produced in each and every second. The information can be worlds data which is responsible for major technological changes that can bring new ways in decision making. Keeping this data for further analysis and computation is a difficult task. Several studies are done in this arena, making it effective for future figuring. Processing or analysing such huge amount of data is a challenging task. All the existing technologies contain certain bottlenecks and performance overheads, diverse challenges like scalability. Spark is the commonly used data analysis framework. Map Reduce is a computing paradigm and a popular model for distributed data analysis. This paper gives a survey about some enormous information technologies, how it will deal with huge data, and the difficulties in existing advances, and has additionally learned about a portion of the execution bottlenecks and preventive techniques, and the concentration at that point moves to the Resilient Distributed Dataset (RDD), and how it is optimized.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.