Abstract

The Internet has become a vital source of knowledge and communication in recent times. Continuous technological advancements have changed the way businesses operate, and everyone today lives in the digital world of engineering. Because of the Internet of Things (IoT) and its applications, people's impressions of the information revolution have improved. Malware detection and categorization are becoming more of a problem in the cybersecurity world. As a result, strong security on the Internet could protect billions of internet users from harmful behavior. In malware detection and classification techniques, several types of deep learning models are used; however, they still have limitations. This study will explore malware detection and classification elements using modern machine learning (ML) approaches, including K-Nearest Neighbors (KNN), Extra Tree (ET), Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), and neural network Multilayer Perceptron (nnMLP). The proposed study uses the publicly available dataset UNSWNB15. In our proposed work, we applied the feature encoding method to convert our dataset into purely numeric values. After that, we applied a feature selection method named Term Frequency-Inverse Document Frequency (TFIDF) based on entropy for the best feature selection. The dataset is then balanced and provided to the ML models for classification. The study concludes that Random Forest, out of all tested ML models, yielded the best accuracy of 97.68 %.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.