Abstract

Complex science workflows involve very large data demands and resource-intensive computations. These demands need reliable high-speed networks, that can optimize performance for application data flows. Characterizing flows into large flows (elephant) versus small flows (mice) can allow networks to optimize performance by detecting and handling demands in real-time. However, predicting elephant versus mice flows is extremely difficult as their definition varies based on networks.Machine learning techniques can help classify flows into two distinct clusters to identify characteristics of transfers. In this paper, we investigate unsupervised and semi-supervised machine learning approaches to classify flows in real time. We develop a Gaussian Mixture Model combined with an initialization algorithm, to develop a novel general-purpose method to help classification based on network sites (in terms of data transfers, flow rates and durations). Our results show that the proposed algorithm is able to cluster elephants and mice with an accuracy rate of 90%. We analyzed NetFlow reports of 1 month from 3 ESnet site routers to train the model and predict clusters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call