Abstract

Anomaly detection systems, also known as intrusion detection systems (IDSs), continuously monitor network traffic aiming to identify malicious actions. Extensive research has been conducted to build efficient IDSs emphasizing two essential characteristics. The first is concerned with finding optimal feature selection, while another deals with employing robust classification schemes. However, the advent of big data concepts in anomaly detection domain and the appearance of sophisticated network attacks in the modern era require some fundamental methodological revisions to develop IDSs. Therefore, we first identify two more significant characteristics in addition to the ones mentioned above. These refer to the need for employing specialized big data processing frameworks and utilizing appropriate datasets for validating system’s performance, which is largely overlooked in existing studies. Afterwards, we set out to develop an anomaly detection system that comprehensively follows these four identified characteristics, i.e., the proposed system (i) performs feature ranking and selection using information gain and automated branch-and-bound algorithms respectively; (ii) employs logistic regression and extreme gradient boosting techniques for classification; (iii) introduces bulk synchronous parallel processing to cater computational requirements of high-speed big data networks; and; (iv) uses the Infromation Security Centre of Excellence, of the University of Brunswick real-time contemporary dataset for performance evaluation. We present experimental results that verify the efficacy of the proposed system.

Highlights

  • This decade has witnessed tremendous growth in cyberspace and various computing devices.Proliferation of the Internet with these computing devices has enhanced efficiency and productivity in almost all the dimensions of life

  • The advances in high-speed big Anomaly detection is a significant issue in computer networks

  • The other two characteristics combat the challenges introduced by large-scale networks and sophisticated network attacks, namely utilizing specialized big data computing engines and obtaining contemporary workloads to conduct performance evaluations of the proposed systems

Read more

Summary

Introduction

This decade has witnessed tremendous growth in cyberspace and various computing devices. During the past number of years, anomaly detection based on machine learning and data mining techniques have received considerable attention among researchers. There are two important aspects that hinder the progress of NIDS research and greatly need the attention of IDS research community They are concerned with the decision to select appropriate big data computing framework and to utilize adequate datasets for the evaluation of an IDS. We emphasize that the value and legitimacy of such decisions is important as other fundamental characteristics possess in the process of developing efficient IDSs. Building on the points addressed so far, we introduce a comprehensive IDS incorporating bulk synchronous parallel

Background and Related Work
Utilizing Machine Learning and Bulk Synchronous Parallel Computing Techniques
Proposed Framework
Data Preprocessing
Feature Ranking and Selection
Attack Recognition
Dataset and Experimental Setup
Performance Evaluation
Results and Discussion
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call