An In-Memory-Based Big Data Analytics with Two-Level Storage on Private Cloud

Nikkita Shekhar Nikkita Shekhar,Ambika Pawar Ambika Pawar

doi:10.1007/978-981-10-1708-7_109

Abstract

With growing capacity of main memory, in-memory big data management and processing is developing and being used in many big data applications. It supports interactive data analysis by improving I/O throughput. Memory-centric distributed file systems such as tachyon and in-memory data clustering framework like Apache Spark are being used in analytical problems where both speed and fault tolerance are mandatory. In order to achieve high-speed big data processing, we proposed a system design which involves two-tier storage architecture which is the combination of HDFS and in-memory-based file system tachyon. Also, our architecture involves Apache Spark, an open-source in-memory-based data processing tool to analyse the big data. In this framework we would utilise the main memory by integrating caching algorithm to improve the data processing time. As the experimental result, we would demonstrate the comparison between performance of traditional Hadoop MapReduce and this in-memory-based framework. In this paper, we survey the existing storage and computation infrastructures, their performance while integrating together and contribution of such infrastructures in solving many I/O intensive analytical issues.

Full Text