Abstract

Contemporary malware families typically use domain generation algorithms (DGAs) to circumvent DNS blacklists, sinkholing, or any types of security system. It means that compromised system generates a large number of pseudo-random domain names by using DGAs based on a seed and uses the subset of domain names to contact the command and control server (C2C). To block the communication point, the security organizations reverse engineer the malware samples based on a seed to identify the corresponding DGA algorithm. Primarily, the lists of reverse engineered domain names are sink-holed and preregistered in a DNS blacklist. This type of task is tedious and moreover DNS blacklist able to detect the already existing DGA based domain name. Additionally, this type of system can be easily circumvented by DGA malware authors. A variant to detect DGA domain name is to intercept DNS packets and identify the nature of domain name based on statistical features. This type of system uses contextual data such as passive DNS and NXDomain. Developing system to detect DGA based on contextual data is difficult due to aggregation of all data and it causes more cost in real-time environment and moreover obtaining the contextual information in end point system is often difficult due to the real-world constraints. Recently, the method which detects the DGA domain name on per domain basis is followed. This method doesn’t rely on any external information and uses only full domain name. There are many works for detecting DGA on per domain names based on both manual feature engineering with classical machine learning (CML) algorithms and automatic feature engineering with deep learning architectures. The performance of methods based on deep learning architectures is higher when compared to the CML algorithms. Additionally, the deep learning based DGA detection methods can stay safe in an adversarial environment when compared to CML classifiers. However, the deep learning architectures are vulnerable to multiclass imbalance problem. Additionally, the multiclass imbalance problem is becoming much more important in DGA domain detection. This is mainly due to the fact that many DGA families have very less number of samples in the training data set. In this work, we propose DeepDGA-MINet which collects the DNS information inside an Ethernet LAN and uses Cost-Sensitive deep learning architectures to handle multiclass imbalance problem. This is done by initiating cost items into backpropogation methodology to identify the importance among each DGA families. The performances of the Cost-Sensitive deep learning architecture are evaluated on AmritaDGA benchmark data set. The Cost-Sensitive deep learning architectures performed well when compared to the original deep learning architectures.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.