Abstract
Machine learning (ML) has emerged as a powerful tool for detecting and mitigating malware, addressing the evolving challenges in cybersecurity. This paper presents a comprehensive overview of ML techniques applied to Windows Portable Executable (PE) malware detection, spanning from theoretical foundations to practical implementations. Theoretical underpinnings such as feature engineering, model selection, and evaluation metrics are explored, followed by discussions on practical aspects including data preprocessing, model training, and deployment considerations. The experimental setup using the Dataiku platform is detailed, and seven ML models are evaluated on both binary and multiclass classification tasks before and after applying Principal Component Analysis (PCA). The performance and interpretability of these models are analyzed using SHAP (Shapley Additive Explanations) and LIME (Local Interpretable Model-agnostic Explanations). By synthesizing insights from theory and practice, this paper aims to provide a comprehensive understanding of ML approaches for Windows PE malware detection and guide future advancements in cybersecurity
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have