Abstract
The escalating prevalence of malware necessitates a proactive and vigilant approach to its detection and mitigation. The ramifications of a successful malware attack on cloud services can be severe, underscoring the critical importance of effective malware detection mechanisms in cloud environments. To address this pressing need, we propose a comprehensive methodology for creating a novel cloud-based malware dataset, namely the CMD_2024 dataset. This dataset integrates static and dynamic attributes, providing a robust framework for malware analysis. The CMD_2024 dataset, comprising 20,850 samples meticulously labeled into various malware categories such as Virus, Trojan, Worm, Ransomware, Adware, Miner, PUA, and Downloader, is designed to facilitate the testing and evaluation of diverse analysis tools, machine learning models, deep learning models, and security systems. We enhance the dataset’s utility and effectiveness by focusing on dynamic features, particularly system calls within the cloud, in conjunction with static attributes. To address the challenges of the imbalance towards less common malware categories in the dataset, we employed the Conditional Tabular Generative Adversarial Network to generate synthetic data, significantly improving the detection capability for these rare malware samples. The application of various machine learning and deep learning classifiers, including our proposed integrated deep learning models, yielded remarkable results, achieving 99.42% accuracy in binary classification and 86.97% in multi-class classification. These outcomes demonstrate the CMD_2024 dataset’s substantial efficacy in supporting robust malware detection within cloud environments.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.