Crypto ransomware attacks have substantially increased in recent years, and owing to their highly profitable nature, this growth will evidently escalate in the future. To better understand this malware and help developers of ransomware detection systems build more robust and reliable solutions, this study investigates ransomware actions during the destruction phase through behavioral feature analysis. We used a dataset with 1524 samples and 30 967 features representing the actions conducted using 582 types of ransomware and 942 good applications (goodware). Six representative and widely used classification algorithms were applied as auxiliary tools to investigate the behavior of these attacks: Naive Bayes (NB), K-Nearest Neighbors (KNN), Logistic Regression (LR), Random Forest (RF), Stochastic Gradient Descent (SGD), and Support Vector Machine (SVM). We achieved an accuracy of 98.48%, balanced accuracy of 98.35%, precision of 98.17%, recall of 97.82%, F-measure of 97.98%, and ROC AUC of 99.87% by using RF for 462 features of the resultant dataset. We propose a new criterion to determine the feature group relevance and a method to distinguish the features that are most related to ransomware and goodware. Our main conclusions are as follows: Application Programming Interface (API) calls are the most relevant feature group, achieving alone a balanced accuracy of 96.49%; native encryption Windows APIs are not crucial for ransomware classification; and the most significant features of ransomware tend to involve handling the thread/process, physical memory operation, and communication, whereas goodware features are more likely to indicate virtual memory, files, directories, and resource operations.
Read full abstract