Abstract

We present a processing pipeline for flow-based traffic classification using a machine learning component leveraging Deep Neural Networks (DNNs). The system is trained to predict likely characteristics of real-world traffic flows from a campus network ahead of time, e.g., a flow’s throughput or duration. Training and evaluation of DNN models are continuously performed on a flow data stream collected from a university data center. Instead of the common binary classification into “mice” and “elephant” (throughput) or “short-term” and “long-term” (duration) flows, predicted flow characteristics are quantized into three classes. Various communication contexts (subset of network traffic, e.g., only TCP) and flow feature groups (subset of flow features, e.g., only a flow’s 5-tuple), which are supported through an enrichment strategy, are considered and investigated. An in-depth description of the data acquisition process, including preprocessing steps and anonymization used to protect sensitive information, is given. Additionally, we employ an accelerated variant of t-distributed Stochastic Neighbor Embedding (t-SNE) to visualize network traffic data. This enables the understanding of traffic characteristics and relations between communication flows at a glance. Furthermore, possible use-cases and a high-level architecture for flow-based routing scenarios utilizing the developed pipeline are proposed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call