DeNNeS: deep embedded neural network expert system for detecting cyber attacks

Samaneh Mahdavifar,Ali A Ghorbani

doi:10.1007/s00521-020-04830-w

Abstract

With the advances in computing powers and increasing volumes of data, deep learning’s emergence has helped revitalize artificial intelligence research. There is a growing trend of applying deep learning techniques to image processing, speech recognition, self-driving cars, and even health-care. Recently, several deep learning models have been employed to detect a cyber threat such as network attack, malware infiltration, or phishing website; nevertheless, they suffer from not being explainable to security experts. Security experts not only do need to detect the incoming threat but also need to know the incorporating features that cause that particular security incident. To address this issue, in this paper, we propose a deep embedded neural network expert system (DeNNeS) that extracts refined rules from a trained deep neural network (DNN) architecture to substitute the knowledge base of an expert system. The knowledge base later is used to classify an unseen security incident and inform the final user of the corresponding rule that made that inference. We consider different rule extraction scenarios, and to prove the robustness of DeNNeS, we evaluate it on two cybersecurity datasets including UCI phishing websites dataset and Android malware dataset comprising more than 4000 Android APKs from several sources. The comparison results of DeNNeS with standalone DNN, JRip, and common machine learning algorithms show that DeNNeS with the retraining uncovered samples scenario outperforms other algorithms on both datasets. Furthermore, the extracted rules approximately reproduce the accuracy of the neural network from which they are derived. DeNNeS achieves an outstanding accuracy of $$97.5\%$$ and a negligible false positive rate of $$1.8\%$$ about $$2.4\%$$ higher and $$3.5\%$$ lower than the rule learner JRip on the phishing dataset. Moreover, DeNNeS outperforms random forest (RF), which produces the highest results among decision tree (DT), support vector machine, k-nearest neighbor, and Gaussian naive Bayes. Despite smaller training data in the malware dataset, DeNNeS achieves an accuracy of $$95.8\%$$ and $${F_{1}\,score}$$ of $$91.1\%$$ , much higher than JRip and RF.

Full Text