Abstract
The Domain Name System (DNS) is among the most ubiquitous and important protocols for network communication; however, security concerns regarding DNS have been on the rise and demand for encrypted traffic has followed suit. Using a publicly available dataset, this work compares 10 different machine learning classifiers using stratified 10-fold cross-validation. The classifiers are used to determine the most effective and efficient way of detecting malicious DNS over Hypertext Transfer Protocol Secure (HTTPS) traffic, dubbed DoH traffic. Model performance is evaluated on Non-DoH vs. DoH traffic, then tested on benign vs. malicious DoH traffic. Additionally, this paper seeks to build upon existing research by removing noise and introducing feature selection methods and feature explainability to produce a better model for real-world deployment. After eliminating five overfitting features, our findings indicate that light gradient boosting machine (LGBM) yielded the highest accuracy to training time ratio while approaching 0% error using 20 top features.
Highlights
The Domain Name System (DNS) is a vital component of the modern internet
We will be discussing various machine learning classifiers utilized for our research: Decision Tree, Random Forest, LightGBM (LGBM), and XGBoost
Decision Tree (DT), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Gaussian Naive Bayes (GNB), Random Forest (RF), AdaBoost Classifier (AB), Gradient Boosting Classifier (GB), XGBoost (XGB), Extra Trees (ET), and light gradient boosting machine (LGBM) were trained on both layers of data
Summary
The Domain Name System (DNS) is a vital component of the modern internet. DNS improves user experience by translating human-readable names into Internet Protocol (IP) addresses which are necessary for accessing websites and domains. DNS-overEncryption protocols such as DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT) were introduced to address these security concerns [2]. These techniques provide significant data, including response IP address, originating IP address, and query type. To establish an encrypted connection with a DoH server, the client sends a DNS request to resolve the Uniform Resource Identifier (URI) template and get the IP address of the server [8]. Once a DoH connection is downgraded to DNS, man-in-the-middle attacks, cache poisoning, DNS hijacking, and many other attacks can be performed [8] Due to these security concerns, it is advantageous to detect malicious DoH traffic.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have