Big Data Analytics: Analysis of Features and Performance of Big Data Ingestion Tools

Andreea Matacuta,Catalina Popa

doi:10.12948/issn14531305/22.2.2018.03

Abstract

The purpose of this study was to analyze the features and performance of some of the most widely used big data ingestion tools. The analysis is made for three data ingestion tools, developed by Apache: Flume, Kafka and NiFi. The study is based on the information about tool functionalities and performance. This information was collected from different sources such as articles, books and forums, provided by people who really used these tools. The goal of this study is to compare the big data ingestion tools, in order to recommend that tool which satisfies best the specific needs. Based on the selected indicators, the results of the study reveal that all tools consistently assure good results in big data ingestion, but NiFi is the best option from the point of view of functionalities and Kafka, considering the performance.

Highlights

During the last years, the technology had a big impact on the applications and in the processing of data, and organizations have begun give more importance to data and invest more in their collection and management
We first introduce some concepts about data ingestion and the importance to choose it to process big data and we propose to do a short description for the tools used in analyze, offering some information about Hadoop ecosystem
We find that the main criteria when a company wants to choose a tool for data ingestion are: speed to ingest data in a rapid way, platform support which offers the facility to connect with data stores, the facility to scale the framework to work with large datasets and the facility to extract and access data from sources without impact on their ability to execute transactions or performance and in our choice for NiFi, Kafka and Flume we used that criteria

Summary

Introduction

The technology had a big impact on the applications and in the processing of data, and organizations have begun give more importance to data and invest more in their collection and management. The final decision was based on [18] were, based on top 18 data ingestion tools, Flume is on second position, followed by Apache Kafka and Apache NiFi, first option been Amazon Kinesis Based on this top we decided that our paper will analyze three tools which represent a main choice for companies and users. Data type for Flume is represented by the file formats Sequence File, DataStream or Compressed Stream, Kafka accept data type like JSon, PoJo or Java bean and the fastest way: arrays, Nifi uses data object (Flow File) Conclusion for this functionality is that a tool that can process all types of data does not exist and the decision for the user depends on his needs. In conclusion from the point of view of functionality, according to our analyze we consider that NiFi represents the best solution to use in a company

NiFi implemented implemented Kafka functionality

Findings

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Informatica Economica	Publication Date: Jun 30, 2018
Citations: 10	License type: cc-by

R Discovery Prime

R Discovery Prime

Big Data Analytics: Analysis of Features and Performance of Big Data Ingestion Tools

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Informatica Economica

Lead the way for us

Similar Papers

Learning analytics for higher education: proposal of big data ingestion architecture
Meseret Yihun Amare ... T Kliestik
SHS Web of Conferences | VOL. 92
Meseret Yihun Amare, et. al.Meseret Yihun Amare ... T Kliestik
01 Jan 2020
SHS Web of Conferences | VOL. 92

Big Data Ingestion and Preparation Tools
Jaber Alwidian ... Maram Gnaim
Modern Applied Science | VOL. 14
Jaber Alwidian, et. al.Jaber Alwidian ... Maram Gnaim
27 Aug 2020
Modern Applied Science | VOL. 14

Chapter 5 Federated Query Processing
Kemele M Endris ... Maria-Esther Vidal
-
Kemele M Endris, et. al.Kemele M Endris ... Maria-Esther Vidal
01 Jan 2020
01 Jan 2020

Experiences and Lessons in Practice Using TPCx-BB Benchmarks
Kebing Wang ... Mike Riess
-
Kebing Wang, et. al.Kebing Wang ... Mike Riess
30 Dec 2017
30 Dec 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Big Data Analytics: Analysis of Features and Performance of Big Data Ingestion Tools

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Informatica Economica