A number of streaming technologies have appeared in the last years as a result of the rising of Big Data applications. Nowadays, deciding which technology to adopt is not an easy task due not only to the number of available data streaming processing projects, but also because they are continuously evolving. In this paper, we focus on how these issues have affected jMetalSP, a framework for dynamic multi-objective optimization that incorporates streaming features. jMetalSP allows the development of three tier optimization workflows where the central component is an optimizer that is continuously solving a dynamic multi-objective optimization problem. This problem can change as a consequence of the analysis of data streams carried out by components that use the Apache Spark streaming engine. A third kind of components receive and process the Pareto front approximations being yielded by the optimization algorithm. However, all jMetalSP elements are tightly coupled and linked to Spark, making it difficult to use a different streaming system. To overcome this issue, we have redesigned the jMetalSP architecture to make it flexible enough to avoid the dependence of any particular streaming system. This way, popular Apache projects such as Spark Structured Streaming, Kafka Streams, or Flink can be used without requiring to change the rest of components of the application. Furthermore, Kafka can be used for inter-process communication, what enables the execution of components in different nodes of a cluster, independently of their implementation languages thanks to the serialization of data streams with Apache Avro. We show how the embraced solution provides a high degree of flexibility that enhances the usability of jMetalSP. To this end, a representative case study based on a transport problem is conducted that focuses on data representation and performance evaluation of the Spark, Flink, and Kafka systems.
Read full abstract