Kafka-ML: Connecting the data stream with ML/AI frameworks

Cristian Martín,Peter Langendoerfer,Pouya Soltani Zarrin,Manuel Díaz,Bartolomé Rubio

doi:10.1016/j.future.2021.07.037

Abstract

Machine Learning (ML) and Artificial Intelligence (AI) depend on data sources to train, improve, and make predictions through their algorithms. With the digital revolution and current paradigms like the Internet of Things, this information is turning from static data to continuous data streams. However, most of the ML/AI frameworks used nowadays are not fully prepared for this revolution. In this paper, we propose Kafka-ML, a novel and open-source framework that enables the management of ML/AI pipelines through data streams. Kafka-ML provides an accessible and user-friendly Web user interface where users can easily define ML models, to then train, evaluate, and deploy them for inferences. Kafka-ML itself and the components it deploys are fully managed through containerization technologies, which ensure their portability, easy distribution, and other features such as fault-tolerance and high availability. Finally, a novel approach has been introduced to manage and reuse data streams, which may eliminate the need for data storage or file systems.

Highlights

In this digital era, information is continuously acquired and processed everywhere, from many sources and for many purposes and sectors
Kafka-Machine Learning (ML) is characterized by its accessibility and ease of use since users need only a few lines of source code to create an ML model in its Web UI to control the ML/Artificial Intelligence (AI) pipeline, creating configurations to evaluate different ML models, training, validating, and deploying trained models for inference
Kafka-ML offers an innovative and opensource solution to manage the daily tasks performed by many ML/AI researchers and developers worldwide

Summary

Introduction

Information is continuously acquired and processed everywhere, from many sources and for many purposes and sectors. Companies like Facebook [2] process millions of photos every day to detect inappropriate contents This creates a continuous data stream for ML/AI algorithms and systems to face. With the rise of the Internet of Things (IoT) [3], new sources of data have been enabled in the Internet era, with a forecast of 500 billion connected devices by 2030 [4] Paradigms such as Industry 4.0, connected cars, and smart cities have become a possibility and, more importantly, they have contributed to the digitization of services in the physical world. In contrast to message queues, publish/subscribe systems allow multiple consumers to receive each message in a topic. In order to satisfy both requirements, Apache Kafka provides the following features:

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Future Generation Computer Systems	Publication Date: Aug 5, 2021
Citations: 33	License type: cc-by

R Discovery Prime

R Discovery Prime

Kafka-ML: Connecting the data stream with ML/AI frameworks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future Generation Computer Systems

Lead the way for us

Similar Papers

Online learning and continuous model upgrading with data streams through the Kafka-ML framework
Alejandro Carnero ... Manuel Díaz
Future Generation Computer Systems | VOL. 160
Alejandro Carnero, et. al.Alejandro Carnero ... Manuel Díaz
06 Jun 2024
Future Generation Computer Systems | VOL. 160

An Artificial Intelligence (AI) Enabled Framework for Cyber Security Using Machine Learning Techniques
Syed Shabbeer Ahmad ... Krishna Prasad K
International Research Journal of Parroha Multiple Campus | VOL. 2
Syed Shabbeer Ahmad, et. al.Syed Shabbeer Ahmad ... Krishna Prasad K
01 Jan 2023
International Research Journal of Parroha Multiple Campus | VOL. 2

Securing Smart Cities: A Cybersecurity Perspective on Integrating IoT, AI, and Machine Learning for Digital Twin Creation
Smita Vempati
Journal of Electrical Systems | VOL. 20
Smita Vempati Smita Vempati
01 May 2024
Journal of Electrical Systems | VOL. 20

A review of on-device machine learning for IoT: An energy perspective
Nazli Tekin ... Vehbi Cagri Gungor
Ad Hoc Networks | VOL. 153
Nazli Tekin, et. al.Nazli Tekin ... Vehbi Cagri Gungor
10 Nov 2023
Ad Hoc Networks | VOL. 153

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Kafka-ML: Connecting the data stream with ML/AI frameworks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future Generation Computer Systems