Abstract

The purpose of this study was to analyze the features and performance of some of the most widely used big data ingestion tools. The analysis is made for three data ingestion tools, developed by Apache: Flume, Kafka and NiFi. The study is based on the information about tool functionalities and performance. This information was collected from different sources such as articles, books and forums, provided by people who really used these tools. The goal of this study is to compare the big data ingestion tools, in order to recommend that tool which satisfies best the specific needs. Based on the selected indicators, the results of the study reveal that all tools consistently assure good results in big data ingestion, but NiFi is the best option from the point of view of functionalities and Kafka, considering the performance.

Highlights

  • During the last years, the technology had a big impact on the applications and in the processing of data, and organizations have begun give more importance to data and invest more in their collection and management

  • We first introduce some concepts about data ingestion and the importance to choose it to process big data and we propose to do a short description for the tools used in analyze, offering some information about Hadoop ecosystem

  • We find that the main criteria when a company wants to choose a tool for data ingestion are: speed to ingest data in a rapid way, platform support which offers the facility to connect with data stores, the facility to scale the framework to work with large datasets and the facility to extract and access data from sources without impact on their ability to execute transactions or performance and in our choice for NiFi, Kafka and Flume we used that criteria

Read more

Summary

Introduction

The technology had a big impact on the applications and in the processing of data, and organizations have begun give more importance to data and invest more in their collection and management. The final decision was based on [18] were, based on top 18 data ingestion tools, Flume is on second position, followed by Apache Kafka and Apache NiFi, first option been Amazon Kinesis Based on this top we decided that our paper will analyze three tools which represent a main choice for companies and users. Data type for Flume is represented by the file formats Sequence File, DataStream or Compressed Stream, Kafka accept data type like JSon, PoJo or Java bean and the fastest way: arrays, Nifi uses data object (Flow File) Conclusion for this functionality is that a tool that can process all types of data does not exist and the decision for the user depends on his needs. In conclusion from the point of view of functionality, according to our analyze we consider that NiFi represents the best solution to use in a company

NiFi implemented implemented Kafka functionality
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.