Abstract

With growing capacity of main memory, in-memory big data management and processing is developing and being used in many big data applications. It supports interactive data analysis by improving I/O throughput. Memory-centric distributed file systems such as tachyon and in-memory data clustering framework like Apache Spark are being used in analytical problems where both speed and fault tolerance are mandatory. In order to achieve high-speed big data processing, we proposed a system design which involves two-tier storage architecture which is the combination of HDFS and in-memory-based file system tachyon. Also, our architecture involves Apache Spark, an open-source in-memory-based data processing tool to analyse the big data. In this framework we would utilise the main memory by integrating caching algorithm to improve the data processing time. As the experimental result, we would demonstrate the comparison between performance of traditional Hadoop MapReduce and this in-memory-based framework. In this paper, we survey the existing storage and computation infrastructures, their performance while integrating together and contribution of such infrastructures in solving many I/O intensive analytical issues.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.