A platform for big data analytics on distributed scale-out storage system

Kyar Nyo Aye,Thandar Thein

doi:10.1504/ijbdi.2015.069088

Kyar Nyo Aye, Thandar Thein

Open Access

https://doi.org/10.1504/ijbdi.2015.069088

Copy DOI

Abstract

Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information. Hadoop-based platform emerges to deal with big data. In Hadoop NameNode is used to store metadata in a single system's memory, which is a performance bottleneck for scale-out. Gluster file system has no performance bottlenecks related to metadata. To achieve massive performance, scalability and fault tolerance for big data analytics, a big data platform is proposed. The proposed big data platform consists of big data storage and big data processing. The Hadoop big data platform and the proposed big data platform are implemented on commodity Linux virtual machines clusters and performance evaluations are conducted. According to the evaluation analysis, the proposed big data platform provides better scalability, fault tolerance, and faster query response time than the Hadoop platform.

Full Text