In this survey, we review the key developments in the field of malware detection using AI and analyze core challenges. We systematically survey state-of-the-art methods across five critical aspects of building an accurate and robust AI-powered malware-detection model: malware sophistication, analysis techniques, malware repositories, feature selection, and machine learning vs. deep learning. The effectiveness of an AI model is dependent on the quality of the features it is trained with. In turn, the quality and authenticity of these features is dependent on the quality of the dataset and the suitability of the analysis tool. Static analysis is fast but is limited by the widespread use of obfuscation. Dynamic analysis is not impacted by obfuscation but is defeated by ubiquitous anti-analysis techniques and requires more computational power. Sophisticated and evasive malware is challenging to extract authentic discriminatory features from and, combined with poor quality datasets, this can lead to a situation where a model achieves high accuracy with only one specific dataset.
Read full abstract