Abstract

Every day, large terabytes of data repository are being generated which comes mostly from modern information systems, new technologies, Internet of Things (IoT) and cloud computing. With the ever-expanding number of alternatives, the choice of picking machine learning tools for big data to analyse such volume of massive data can be difficult and so necessitates exertions at various stages to excerpt information meant for decision making. As big data analysis is currently the latest researchable area of interest, this paper therefore intended to aid researchers understand machine learning and focus on exploring the impact of open-source tools for the processing of big data. Machine learning was used to analyse three open-source tools of Hadoop, Spark and Presto These open-source tools were evaluated by considering scalability, fault tolerant and latency as the metrics. While Presto as a tool for big data analytic was discovered to be efficient and fast in processing huge data, spark plays greater role in precision and Hadoop was found to be the best in fault tolerance. In conclusion, the paper furnishes the platform with various steps to explore big data that could open latest sphere of research development.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call