Zero-day malware is a significant threat to cybersecurity as it is unknown to antivirus systems and can cause significant damage before being detected. Traditional malware detection methods rely on signatures and patterns specific to known malware, but these methods are ineffective against zero-day malware that has not been previously encountered. Machine learning has shown promise in detecting and classifying unknown threats, including zero-day malware. In this research, we propose using machine learning classifiers to detect and classify zero-day malware. The selected classifiers are Random Forest and XGBoost, well-known and widely used in machine learning. To evaluate the effectiveness of our approach, we first collect and pre-process a dataset of known malware. The dataset, Meraz'18, is used to train and test the selected classifiers. This dataset contains PEHeaders with static analysis performed with each section calculated on its entropy. The dataset contains a representation of benign and malicious files that has been use in previous studies for zero-day malware detection, namely during the payload injection phase. To prevent overfitting, 10-fold-cross-validation is utilized. The performance metrics of these classifiers such as F1-score, accuracy, Cohen’s kappa, precision and recall analyzed on the known malware dataset and evaluate their ability to detect and classify zero-day malware. Hyperparameter tuning is used to tune each model to give the best performance of each model. The results show that the proposed classifiers perform extremely well, both achieving up to almost 98% accuracy. Using machine learning classifiers for zero-day malware detection and classification can significantly improve cybersecurity by providing a way to detect and protect against unknown threats. This work is an essential step towards the development of more robust cybersecurity systems that can effectively protect against unknown threats.
Read full abstract