A clustering-based analysis of DPI-labeled video flow characteristics in cellular networks

Johan Garcia

doi:10.23919/inm.2017.7987420

Abstract

Using a specially instrumented deep packet inspection (DPI) appliance placed inside the core network of a commercial cellular operator we collect data from almost four million flows produced by a ‘heavy-hitter’ subset of the customer base. The data contains per packet information for the first 100 packets in each flow, along with the classification done by the DPI engine. The data is used with unsupervised learning to obtain clusters of typical video flow behaviors, with the intent to quantify the number of such clusters and examine their characteristics. Among the flows identified as belonging to video applications by the DPI engine, a subset are actually video application signaling flows or other flows not carrying actual transfers of video data. Given that DPI-labeled data can be used to train supervised machine learning models to identify flows carrying video transfers in encrypted traffic, the potential presence and structure of such ‘noise’ flows in the ground truth is important to examine. In this study K-means and DBSCAN is used to cluster the flows marked by the DPI engine as being from a video application. The clustering techniques identify a set of 4 to 6 clusters with archetypal flow behaviors, and a subset of these clusters are found to represent flows that are not actually transferring video data.

Full Text