Malware refers to software that is designed to achieve a malicious purpose usually to benefit its creator. To accomplish this, malware hides its true purpose from its target and malware analysts until it has established a foothold on the victim’s machine. Malware analysts, therefore, have to find increasingly sophisticated methods to detect malware prompting malware authors to increase the number of evasive techniques employed by their malware. Dynamic malware analysis has been framed as a potential solution as it runs malware in its preferred environment to ensure that it observes its true behaviour. However, it is usually a restricted form of the preferred environment and malware may only be run for two minutes or less. This means that if malware does not demonstrate its malicious intent within that time frame and environment, the behaviour observed and subsequently learned may not be the behaviour that needs to be prevented. There is a risk that classifiers trained using the standard dynamic malware analysis process will only recognise malware by its evasive behaviour rather than a mix of behaviours. In this paper, we study the extent to which classifiers are dependent on evasive behaviour when identifying malware. We achieve this by training them on real ransomware and benignware and then testing their ability to detect carefully crafted simulated ransomware. The simulated ransomware gives us the freedom to create samples with different levels of evasive and malicious behaviour. The simulated samples, like the real samples, are run in a sandboxed environment where data is collected at a user- and Kernel-level. The results of our experiments indicated that, in general, the classifiers were more likely to label the simulated samples as malicious once the amount of evasive behaviour present in a sample went beyond a threshold. Generally, this threshold was crossed when the simulated ransomware waited 2 s or more between each file it encrypted. Additionally, the classifiers trained on the user-level data were not as robust against small changes in system calls made. Whereas, when trained on system calls gathered at a Kernel, system-wide level, the classifiers’ results were less variable. Finally, in attempting to simulate malware for our experiments, we discovered that the field of malware simulation is relatively unstudied despite its potential and therefore provide recommendations for simulating malware for system-call analysis.
Read full abstract