Classifying Categories of SCADA Attacks in a Big Data Framework

Krishna Madhuri Paramkusem,Ramazan S Aygun

doi:10.1007/s40745-018-0141-8

Abstract

The supervisory control and data acquisition (SCADA) systems monitor and control industrial control systems in many industrial and economic sectors such as water treatment, power plants, railroads, and gas pipelines. The integration of SCADA systems with the internet and corporate enterprise networks for various economical reasons exposes SCADA systems to attacks by hackers who could remotely exploit and gain access to SCADA systems to damage the infrastructure and thereby harming people’s lives. The simplicity of datasets and possible overfitting of models to training data are some of the issues in the previous research. In this paper, we present detecting and classifying malicious command and response packets in a SCADA network by analyzing attribute differences and history of packets using k-means clustering. This study presents a solution to classify SCADA cyber attacks to detect and classify SCADA attacks with high accuracy using a big data framework that comprises of Apache Hadoop and Apache Mahout. Apache Mahout’s random forest classification algorithm is applied on SCADA’s gas pipeline dataset to categorize attacks. When 70% of the data is used for training the classifier, our approach resulted in 5–17% improvement in accuracy for the classification of read response attacks and 2–8% improvement in accuracy for write command attacks with respect to using the original dataset.

Full Text