Short-time prediction of DNS queries using deep learning and pre-trained word embedding

Merlino Jorge,Pablo Rodríguez-Bocca

doi:10.19153/cleiej.25.2.6

Merlino Jorge, Pablo Rodríguez-Bocca

Open Access

https://doi.org/10.19153/cleiej.25.2.6

Copy DOI

Abstract

Word embeddings are used in natural language processing to group semantically similarwords. In this paper, we create word embeddings for Internet Domain Names (DNS)from corpora of anonymized DNS queries from an Internet Service Provider. We use eachembedding as a layer of a recurrent neural network (RNN) that works as a LanguageModel for the DNS queries generated by the users. We use these RNNs to predict thenext DNS query in two different cases. A first case tries to predict the next domain queryfrom the DNS server’s point of view so the corpus is close to the original log data. Asecond case tries to predict the next domain queried by a user from the user’s point ofview. Here the corpus has larger preprocessing.We show that this procedure has good accuracy for the DNS server-side problem, butlow accuracy for the user-side problem. Moreover, we show that training the same RNNwithout using the pre-trained embedding takes more time and is substantially less accu-rate. These results have practical applications for the service’s latency reduction, cacheoptimization in recursive DNS servers, automatic filtering of inappropriate domains, anddetecting anomalies.

Full Text