Abstract
Static malware analysis is well-suited to endpoint anti-virus systems as it can be conducted quickly by examining the features of an executable piece of code and matching it to previously observed malicious code. However, static code analysis can be vulnerable to code obfuscation techniques. Behavioural data collected during file execution is more difficult to obfuscate, but takes a relatively long time to capture - typically up to 5 min, meaning the malicious payload has likely already been delivered by the time it is detected.In this paper we investigate the possibility of predicting whether or not an executable is malicious based on a short snapshot of behavioural data. We find that an ensemble of recurrent neural networks are able to predict whether an executable is malicious or benign within the first 5 s of execution with 94% accuracy. This is the first time general types of malicious file have been predicted to be malicious during execution rather than using a complete activity log file post-execution, and enables cyber security endpoint protection to be advanced to use behavioural data for blocking malicious payloads rather than detecting them post-execution and having to repair the damage.
Highlights
Automatic malware detection is necessary to process the rapidly rising rate and volume of new malware being generated
The main contributions of this paper are: 1. We propose a recurrent neural network (RNN) model to predict malicious behaviour using machine activity data and demonstrate its capabilities are superior to other machine learning solutions that have previously been used for malware detection
The code used to implement the following experiments can be found at https://github.com/mprhode/malware-prediction-rnn
Summary
Automatic malware detection is necessary to process the rapidly rising rate and volume of new malware being generated. Automatic malware detection used in anti-virus systems compares (features extracted from) the code of an incoming file to a known list of malware signatures. This form of filtering using static data is unsuited to detecting completely new (“zero-day”). Behavioural analysis approaches assume that malware cannot avoid leaving a measurable footprint as a result of the actions necessary for it to achieve its aims. Executing the malware incurs a time penalty by comparison with static analysis. Whilst dynamic data can lead to more accurate and resilient detection models than static data ([4], [5], [6]), in practice behavioural data is rarely used in commercial endpoint anti-virus systems due to this time penalty. It is inconvenient and inefficient to wait for several minutes whilst a single file is analysed, and the malicious payload has likely been delivered by the end of the analysis window so the opportunity to block malicious actions has been missed
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.