Short-time prediction of DNS queries using deep learning and pre-trained word embedding

Jorge Merlino,Pablo Rodriguez-Bocca

doi:10.1109/clei53233.2021.9640221

Abstract

Word embeddings are widely used in natural language processing (NLP) to group semantically similar words but have been applied in other areas to find semantic similarity between entities. In this paper we create a vector embedding for Internet Domain Names (DNS) using a corpus of real anonymized DNS log queries from a large Internet Service Provider (ISP). We then use this embedding as a layer of a recurrent neural network (RNN) that works as a Language Model for the DNS queries generated by the users. We show that this RNN can be used to predict the next DNS query generated by a user with good accuracy (considering the size of the problem). Moreover, we show that training the same RNN without using the pre-trained vector model takes more time and is substantially less accurate. The results presented in this work can have practical applications in many engineering activities related to DNS architecture design. For example, latency reduction in address resolution, optimization of cache systems in recursive DNS servers, automatic filtering of inappropriate domains, and detecting anomalies in traffic.

Full Text