Abstract

Detection and mitigation of modern malware are critical for the normal operation of an organisation. Traditional defence mechanisms are becoming increasingly ineffective due to the techniques used by attackers such as code obfuscation, metamorphism, and polymorphism, which strengthen the resilience of malware. In this context, the development of adaptive, more effective malware detection methods has been identified as an urgent requirement for protecting the IT infrastructure against such threats, and for ensuring security. In this paper, we investigate an alternative method for malware detection that is based on N-grams and machine learning. We use a dynamic analysis technique to extract an Indicator of Compromise (IOC) for malicious files, which are represented using N-grams. The paper also proposes TF-IDF as a novel alternative used to identify the most significant N-grams features for training a machine learning algorithm. Finally, the paper evaluates the proposed technique using various supervised machine-learning algorithms. The results show that Logistic Regression, with a score of 98.4%, provides the best classification accuracy when compared to the other classifiers used.

Highlights

  • Malware is a broad term that refers to any piece of software designed intentionally to damage the normal functionality of a computer or a network [1]

  • We aim to improve the current state of the art in malware analysis by presenting the design and experimental evaluation of a malware detection system, with the following contributions: (a) Malware behavioural modelling using advance sandbox: In contrast to other studies and research work where the traditional sandboxes such as Cuckoo, Norman, Joe, etc. were used to model the behaviour of malware as from our previous research work [36], we found that they are not so effective in capturing the behaviour of advanced and sophisticated malware; we have utilized AI-based sandbox in this work to perform dynamic analysis and to model the behaviour of the malware

  • (c) Optimise Classification: We present the design of a classification system that uses Naive Bayes, Decision Tree, Random Forest to detected malware using new features

Read more

Summary

Introduction

Malware is a broad term that refers to any piece of software designed intentionally to damage the normal functionality of a computer or a network [1]. Current malware target widely and indiscriminately from individuals and residential customers to IT systems within large organisations or critical country-wide infrastructures (including nuclear plants and water supply systems), which traditionally have been considered highly secure [2]. Within this spectrum, according to recent reports [3,4], there is a significant increase in the production of malware variants that are targeting critical infrastructures. Existing malware variants are continuously evolving, as malware writers improve their detection avoidance mechanisms.

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.