Abstract

With the explosion of geo-distributed data, the huge treasures hidden in them are waiting to be explored to obtain valuable insights. This results in the need for an effective geo- distributed data analysis method. The traditional approach to geo-distributed data analytics is to gather all the required data into a single edge datacenter (edge DC) through one transmission and aggregation (centralized data aggregation). However, as the volume of data grows exponentially, the centralized data aggregation scheme becomes inefficient or infeasible due to the limitations of the computing and network resources. In this paper, we propose the geo-distributed data aggregation scheme in edge compute first networking (CFN) with joint consideration of computation and communication resources. The proposed scheme optimizes two objectives: the first is to minimize the job completion time (JCT) by selecting cluster centers, dividing clusters and provisioning lightpaths; the second objective is to reduce bandwidth consumption by reallocating routing and frequency slots based on JCT. To achieve the two objectives, we first formulate the optimization problem of multi-stage geo- distributed data aggregation as a linear programming (LP) model. To tackle the computational complexity issue of the LP model, a multi-stage geo-distributed data aggregation algorithm jointly with computation and communication resources (MGDD-CC) is proposed. Simulation results show that the proposed scheme can reduce JCT, alleviate the competition for bandwidth resources and is more suitable for scenarios with better data aggregation effects and larger quantities of geo-distributed data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call