Abstract

Abstract In recent years, botnets have become a major threat on the Internet. Most sophisticated bots use Domain Generation Algorithms (DGA) to pseudo-randomly generate a large number of domains and select a subset in order to communicate with Command and Control (C&C) server. The basic aim is to avoid blacklisting, sinkholing and evade the security systems. Long Short-Term Memory network (LSTM) provides a mean to combat this botnet type. It operates on raw domains and is amenable to immediate applications. LSTM is however prone to multiclass imbalance problem, which becomes even more significant in DGA malware detection. This is due the fact that many DGA classes have a very little support in the training dataset. This paper presents a novel LSTM.MI algorithm to combine both binary and multiclass classification models, where the original LSTM is adapted to be cost-sensitive. The cost items are introduced into backpropagation learning procedure to take into account the identification importance among classes. Experiments are carried out on a real-world collected dataset. They demonstrate that LSTM.MI provides an improvement of at least 7% in terms of macro-averaging recall and precision as compared to the original LSTM and other state-of-the-art cost-sensitive methods. It is also able to preserve the high accuracy on non-DGA generated class (0.9849 F1-score), while helping recognize 5 additional bot families.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.