Owing to the critical role of the domain name system (DNS), its query log data are utilized for various network monitoring purposes. With the diversification of network services, these data have become increasingly complex, making mining useful information challenging. DNS query log data can be considered as the superposition of two types of communication patterns: groups of domains accessed simultaneously (e.g., ad servers and content delivery network (CDN) servers) and time-series access patterns based on user behavior characteristics (e.g., access trends during the night). However, previous studies have not focused on extracting both access patterns hidden in the data. This study proposes a method that extracts both patterns of accessed domains and temporal access patterns as user communication behaviors from DNS query log data and predicts future accesses based on these patterns. The proposed method first aggregates similar fully qualified domain names (FQDNs) associated with the same service. We then present temporal regularized nonnegative tensor factorization (TR-NTF) that extracts both access patterns from a third-order tensor expressing DNS query log data and enables prediction. We evaluate the proposed method using synthetic and actual data and demonstrate that it successfully extracts hidden communication patterns and achieves sufficient prediction accuracy.
Read full abstract