Abstract

The complexity of the Internet and the volume of network traffic have dramatically increased in the last few years, making it more challenging to design scalable Network Traffic Monitoring and Analysis (NTMA) systems. Critical NTMA applications such as the detection of network attacks and anomalies require fast mechanisms for on-line analysis of thousands of events per second, as well as efficient techniques for off-line analysis of massive historical data. The high-dimensionality of network data provided by current network monitoring systems opens the door to the massive application of machine learning approaches to improve the detection and classification of network attacks and anomalies, but this higher dimensionality comes with an extra data processing overhead. In this paper we present Big-DAMA, a big data analytics framework (BDAF) for NTMA applications. Big-DAMA is a flexible BDAF, capable to analyze and store big amounts of both structured and unstructured heterogeneous data sources, with both stream and batch processing capabilities. Big-DAMA uses off-the-shelf big data storage and processing engines to offer both stream data processing and batch processing capabilities, decomposing separate engines for stream, batch and query, following a Data Stream Warehouse (DSW) paradigm. Big-DAMA implements several algorithms for anomaly detection and network security using supervised and unsupervised machine learning (ML) models, using off-the-shelf ML libraries. We apply Big-DAMA to the detection of different types of network attacks and anomalies, benchmarking multiple supervised ML models. Evaluations are conducted on top of real network measurements collected at the WIDE backbone network, using the well-known MAWILab dataset for attacks labeling. Big-DAMA can speed up computations by a factor of 10 when compared to a standard Apache Spark cluster, and can be easily deployed in cloud environments, using hardware virtualization technology.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call