A comprehensive analysis combining structural features for detection of new ransomware families

Caio C. Moreira,Davi C. Moreira,Claudomiro Sales

doi:10.1016/j.jisa.2024.103716

Abstract

This study presents a comprehensive static analysis method that combines multiple structural features extracted from Windows executable files. The method employs an ensemble soft voting model that comprises three machine learning techniques: Logistic Regression (LR), Random Forest (RF), and eXtreme Gradient Boosting (XGB). Our proposed model aims to identify newly emerged ransomware families by analyzing header fields, imported Dynamic-link Libraries (DLLs), function calls, and entropy of sections. To assess the method’s efficacy in detecting zero-day ransomware families, we created a dataset consisting of 2675 binary samples. The training set consisted of 1023 samples from 25 relevant ransomware families and 1134 benign applications (goodware) samples. The testing set comprised 385 samples from 15 recent ransomware families and 133 goodware samples. The results for the Detection of New Ransomware Families (DNRF) demonstrated weighted averages of 97.53% accuracy, 96.36% precision, 97.52% recall, and 96.41% F-measure. In addition, the scanning and prediction showed an average of 0.37 s. These results showed the model’s adaptability to the ever-changing ransomware landscape while maintaining reasonable testing times, making it applicable as an additional security layer in antivirus protection systems on low-resource hardware devices. Furthermore, we used the SHapley Additive exPlanations (SHAP) interpretation method to establish trust and gain insights into the decision-making process of the proposed model. Our method offers significant advantages and can assist developers of ransomware detection systems in creating more resilient, dependable, and real-time solutions.

Full Text