This study addresses the challenges associated with detecting DNS over HTTPS (DoH) traffic, a relatively new protocol that has not been extensively researched. The detection methods discussed include TLS inspection, application logging, and open-source tools such as Zeek and RITA. TLS inspection, which involves decrypting and analyzing traffic, is the most intrusive and requires full control over the network and client configurations. Application logging, such as that available in Mozilla Firefox, necessitates administrative control over client systems, which may be impractical. Zeek analyzes network logs to identify domains accessed without regular DNS queries, while JA3 fingerprints and RITA focus on detecting malicious DoH traffic by analyzing TLS handshake parameters and beacon-like activities, respectively. Additionally, maintaining up-to-date blacklists of IP addresses and SNI values can help identify DoH traffic but faces scalability and evasion challenges. The study highlights that no current solution is entirely feasible, with many requiring excessive administrative overhead or failing to scale effectively. A hybrid approach using machine learning models and traffic analysis, as illustrated by the CIRA-CIC-DoHBrw-2020 dataset, is proposed for more effective detection of malicious DoH traffic. This approach involves the architecture of a two-stage DoH traffic identification system is presented, consisting of three subsystems: traffic, training and evaluation, and identification. They operate sequentially, with the system's function being traffic identification, training, testing, and information processing within the DoH protocol. The next step is process of cross-validation, which involves training a machine learning model K times, with each iteration using a different fold as the validation set, while the remaining folds serve as the training set. The aim of this work: Development and implementation a DoH traffic identification system, which, unlike existing solutions, is based on a hybrid approach to identifying malicious traffic using open tools for detecting encrypted DNS traffic and specialized machine learning models.
Read full abstract