Abstract

The Domain Name System (DNS) is indispensable for almost all Internet services. It has been extensively studied for applications such as anomaly detection. However, the fundamental question of whether a DNS query from a querent (i.e., an IP address) is triggered by humans or issued by software entities remains unclear. Addressing this question enables us to profile the querent’s behavior from a human-software perspective, facilitating the understanding of “who is DNS serving for?”. In this study, we systematically performed querent-centric DNS modeling. Through in-depth measurements of three real-world DNS datasets of diverse origins, we developed an entropy-based method to distinguish between human and non-human queries and proposed a semi-supervised solution towards a community-level view for detecting and estimating software entities in a network. The solution can not only detect unknown software entities but is also NAT-compatible because it can detect and estimate software entities of multiple hosts NATed behind a single querent. An extensive evaluation demonstrates that our approach provides a new functionality for automatically disclosing the distinction between human and non-human domain names as well as a priori-independent and NAT-compatible functionality of discovering nearly 50% of the software entities and estimating their population using DNS queries.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call