Abstract
This article presents an algorithm to detect outliers in seasonal, univariate network traffic data using Gaussian Mixture Models (GMMs). Additionally we show that this methodology can easily be implemented in a big data scenario and delivers the required information to a security analyst in an efficient manner. The unsupervised clustering algorithm GMM, is modified such that all data points in a set are labelled as either outliers or normal data points. In this article, the algorithm is only evaluated on time series data obtained from network traffic, however it can easily be modified to be used for other types of seasonal univariate big data sets. Detecting outliers in network traffic data occurs in two stages. First, GMMs are built for training data in each time bin of seasonal time series data. Outliers or anomalies are detected and removed in this training data set by examining the probability associated with each data point. Second, GMMs are rebuilt after outliers are removed in historical or training data and the re-computed GMMs are used to detect outliers in test data. Results are compared to traditional methods of outlier detection which usually treat all data from a set as coming from a single probability density function.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.