Detecting Malicious DNS over HTTPS Traffic in Domain Name System using Machine Learning Classifiers

Yaser M Banadaki

doi:10.12691/jcsa-8-2-2

Abstract

This paper presents a systematic two-layer approach for detecting DNS over HTTPS (DoH) traffic and distinguishing Benign-DoH traffic from Malicious-DoH traffic using six machine learning algorithms. The capability of machine learning classifiers is evaluated considering their accuracy, precision, recall, and F-score, confusion matrices, ROC curves, and feature importance. The results show that LGBM and XGBoost algorithms outperform the other algorithms in almost all the classification metrics reaching the maximum accuracy of 100% in the classification tasks of layers 1 and 2. LGBM algorithms only misclassified one DoH traffic test as non-DoH out of 4000 test datasets. It has also found that out of 34 features extracted from the CIRA-CIC-DoHBrw-2020 dataset, SourceIP is the critical feature for classifying DoH traffic from non-DoH traffic in layer one followed by DestinationIP feature. However, only DestinationIP is an important feature for LGBM and gradient boosting algorithms when classifying Benign-DoH from Malicious-DoH traffic in layer 2.

Full Text