Malware Classification Using Machine Learning Models

Sudesh Kumar,Shersingh Shersingh,Siddhant Kumar,Karan Verma

doi:10.1016/j.procs.2024.04.133

Abstract

Malware, a form of harmful software, poses a significant threat to victims by compromising data integrity and facilitating unauthorized access. Analogous to the COVID virus’s impact on the human body, untreated malware can cause ongoing internal harm until system limits are exhausted. Before proceeding with any further steps in our research, a crucial initial task is to explore various types of benchmark malware datasets. To address the challenge of malware dataset selection a comprehensive search for benchmark datasets conducted and selected CIC-MalMem-2022 dataset. Our dataset included 29,298 samples encompassing various malware families and benign instances. The proposed framework uses six different types of machine learning algorithms, namely Logistic Regression, Support Vector Machine, K-Nearest Neighbor, Random Forest, Naive Bayes, and Decision Tree for the classification of malware. Correlation based feature selection approach is used with a threshold value of 0.6 to identify the most important features and eliminate redundant and irrelevant features. The six models were cross-validated by 10-fold cross validation method. After evaluating the ML models using different classifiers, considering metrics such as accuracy, precision, recall, and F1-score, it was evident that the Decision Tree model outperformed the others. The achieved classification metrics for accuracy, precision, recall, and F1-score were as follows: 0.9994240, 0.9996162, 0.9992325, and 0.9994882, respectively. However, in the testing phase, the Decision Tree and KNN classifiers exhibited comparable performance, with both achieving an accuracy of 0. 9998293.These outcomes demonstrate the model’s ability in identifying and categorizing malware, thereby contributing to enhanced data security.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Malware Classification Using Machine Learning Models

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science

Lead the way for us

Journal: Procedia Computer Science	Publication Date: Jan 1, 2024
License type: cc-by-nc-nd

Similar Papers

Comparing the Efficiency of Heart Disease Prediction using Novel Random Forest, Logistic Regression and Decision Tree And SVM Algorithms
S.T.P Prasanna ... T Veeramani
CARDIOMETRY | VOL. -
S.T.P Prasanna, et. al.S.T.P Prasanna ... T Veeramani
14 Feb 2023
CARDIOMETRY | VOL. -

Machine Learning Approach to Support the Detection of Parkinson's Disease in IMU-Based Gait Analysis
D Trabassi ... S.F Castiglia
Gait & Posture | VOL. 97
D Trabassi, et. al.D Trabassi ... S.F Castiglia
01 Oct 2022
Gait & Posture | VOL. 97

Predicting Functional Outcome Using 24-Hour Post-Treatment Characteristics: Application of Machine Learning Algorithms in the STRATIS Registry.
Alicia C Castonguay ... Osama O Zaidat
Annals of Neurology | VOL. 93
Alicia C Castonguay, et. al.Alicia C Castonguay ... Osama O Zaidat
29 Oct 2022
Annals of Neurology | VOL. 93

Pixel-based Feature for Android Malware Family Classification using Machine Learning Algorithms
Mohd Zamri Osman ... Ahmad Firdaus Zainal Abidin
-
Mohd Zamri Osman, et. al.Mohd Zamri Osman ... Ahmad Firdaus Zainal Abidin
01 Aug 2021
01 Aug 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Malware Classification Using Machine Learning Models

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science