Abstract
Recent families of malware have largely adopted domain generation algorithms (DGAs). This is primarily due to the fact that the DGA can generate a large number of domain names after that utilization a little subset for real command and control (C&C) server communication. DNS blacklist based on blacklisting and sink-holing is the most commonly used approach to block DGA C&C traffic. This is a daunting task because the network admin has to continuously update the DNS blacklist to control the constant updating behaviors of DGA. Another significant direction is to predict the domain name as DGA generated by intercepting the DNS queries in DNS traffic. Most of the existing methods are based on identifying groupings based on clustering, statistical properties are estimated for groupings and classification is done using statistical tests. This approach takes larger time-window and moreover can’t be used in real-time DGA domain detection. Additionally, these techniques use passive DNS and NXDomain information. Integration of all these various information charges high-cost and in some case is highly impossible to obtain all these information because of real-time constraints. Detecting DGA on per domain basis is an alternative approach which requires no additional information. The existing methods on detecting DGA per domain basis is based on machine learning. This approach relies on feature engineering which is a time-consuming process and can be easily circumvented by malware authors. In recent days, the application of deep learning is leveraged for DGA detection on per domain basis. This requires no feature engineering and easily can’t be circumvented. In all the existing studies of DGA detection, the deep learning architectures performed well in comparison to the classical machine learning algorithms (CMLAs). Following, in this chapter we propose a deep learning based framework named as I-DGA-DC-Net, which composed of Domain name similarity checker and Domain name statistical analyzer modules. The Domain name similarity checker uses deep learning architecture and compared with the classical string comparison methods. These experiments are run on the publically available data set. Following, the domains which are not detected by similar are passed into statistical analyzer. This takes the raw domain names as input and captures the optimal features implicitly by passing into character level embedding followed by deep learning layers and classify them using the CMLAs. Moreover, the effectiveness of the CMLAs are studied for categorizing algorithmically generated malware to its corresponding malware family over fully connected layer with \(\textit{softmax}\) non-linear activation function using AmritaDGA data set. All experiments related deep learning architectures are run till 100 epochs with learning rate 0.01. The experiments with deep learning architectures-CMLs showed highest test accuracy in comparison to deep learning architectures-\(\textit{softmax}\) model. This is due to the reason that the deep learning architectures are good at obtaining high level features and SVM good at constructing decision surfaces from optimal features. SVM generally can’t learn complicated abstract and invariant features whereas the hidden layers in deep learning architectures facilitate to capture them.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.