Evaluation of Open-Source Tools for Big Data Processing

Umar Suleiman Ahmad,Muhammad Salisu Ali,Abubakar Muhammad Miyim

doi:10.4314/dujopas.v8i3b.10

Umar Suleiman Ahmad, Muhammad Salisu Ali + Show 1 more

Open Access

https://doi.org/10.4314/dujopas.v8i3b.10

Copy DOI

Abstract

Every day, large terabytes of data repository are being generated which comes mostly from modern information systems, new technologies, Internet of Things (IoT) and cloud computing. With the ever-expanding number of alternatives, the choice of picking machine learning tools for big data to analyse such volume of massive data can be difficult and so necessitates exertions at various stages to excerpt information meant for decision making. As big data analysis is currently the latest researchable area of interest, this paper therefore intended to aid researchers understand machine learning and focus on exploring the impact of open-source tools for the processing of big data. Machine learning was used to analyse three open-source tools of Hadoop, Spark and Presto These open-source tools were evaluated by considering scalability, fault tolerant and latency as the metrics. While Presto as a tool for big data analytic was discovered to be efficient and fast in processing huge data, spark plays greater role in precision and Hadoop was found to be the best in fault tolerance. In conclusion, the paper furnishes the platform with various steps to explore big data that could open latest sphere of research development.

Full Text