Abstract

In the last few years, several processing approaches have emerged to deal with Big Data. Exploiting on-the-fly computation, Data Stream Processing (DSP) applications can process unbounded streams of data to extract valuable information in a near real-time fashion. To keep up with the high volume of daily produced data, the operators that compose a DSP application can be replicated and placed on multiple, possibly distributed, computing nodes, so to process the incoming data flow in parallel. In this paper, we present Optimal DSP Replication and Placement (ODRP), a unified general formulation of the operator replication and placement problem that takes into account the heterogeneity of application requirements and infrastructural resources. A key feature of ODRP is the joint optimization of the operators replication and their placement. We evaluate the proposed model through a set of numerical experiments that demonstrates its flexibility and the benefits that derive from the joint optimization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call