Abstract

Malware, a form of harmful software, poses a significant threat to victims by compromising data integrity and facilitating unauthorized access. Analogous to the COVID virus’s impact on the human body, untreated malware can cause ongoing internal harm until system limits are exhausted. Before proceeding with any further steps in our research, a crucial initial task is to explore various types of benchmark malware datasets. To address the challenge of malware dataset selection a comprehensive search for benchmark datasets conducted and selected CIC-MalMem-2022 dataset. Our dataset included 29,298 samples encompassing various malware families and benign instances. The proposed framework uses six different types of machine learning algorithms, namely Logistic Regression, Support Vector Machine, K-Nearest Neighbor, Random Forest, Naive Bayes, and Decision Tree for the classification of malware. Correlation based feature selection approach is used with a threshold value of 0.6 to identify the most important features and eliminate redundant and irrelevant features. The six models were cross-validated by 10-fold cross validation method. After evaluating the ML models using different classifiers, considering metrics such as accuracy, precision, recall, and F1-score, it was evident that the Decision Tree model outperformed the others. The achieved classification metrics for accuracy, precision, recall, and F1-score were as follows: 0.9994240, 0.9996162, 0.9992325, and 0.9994882, respectively. However, in the testing phase, the Decision Tree and KNN classifiers exhibited comparable performance, with both achieving an accuracy of 0. 9998293.These outcomes demonstrate the model’s ability in identifying and categorizing malware, thereby contributing to enhanced data security.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.