The rapid advancement of malware poses a significant threat to devices, like personal computers and mobile phones. One of the most serious threats commonly faced is malicious software, including viruses, worms, trojan horses, and ransomware. Conventional antivirus software is becoming ineffective against the ever-evolving nature of malware, which can now take on various forms like polymorphic, metamorphic, and oligomorphic variants. These advanced malware types can not only replicate and distribute themselves, but also create unique fingerprints for each offspring. To address this challenge, a new generation of antivirus software based on machine learning is needed. This intelligent approach can detect malware based on its behavior, rather than relying on outdated fingerprint-based methods. This study explored the integration of machine learning models for malware detection using various ensemble algorithms and feature selection techniques. The study compared three ensemble algorithms: Gradient Boosting, Random Forest, and AdaBoost. It used Information Gain for feature selection, analyzing 21 features. Additionally, the study employed a public dataset called ‘Malware Static and Dynamic Features VxHeaven and VirusTotal Data Set’, which encompasses both static and dynamic malware features. The results demonstrate that the Gradient Boosting algorithm combined with Information Gain feature selection achieved the highest performance, reaching an accuracy and F1-Score of 99.2%.
Read full abstract