Malware API Calls Detection Using Hybrid Logistic Regression and RNN Model

Abdulaziz Almaleh,Rahaf Ogran,Reem Almushabb

doi:10.3390/app13095439

Abdulaziz Almaleh, Rahaf Ogran + Show 1 more

Open Access

https://doi.org/10.3390/app13095439

Copy DOI

Journal: Applied Sciences	Publication Date: Apr 27, 2023
Citations: 2	License type: CC BY 4.0

Affiliation: King Khalid University

Abstract

Behavioral malware analysis is a powerful technique used against zero-day and obfuscated malware. Additionally referred to as dynamic malware analysis, this approach employs various methods to achieve enhanced detection. One such method involves using machine learning and deep learning algorithms to learn from the behavior of malware. However, the task of weight initialization in neural networks remains an active area of research. In this paper, we present a novel hybrid model that utilizes both machine learning and deep learning algorithms to detect malware across various categories. The proposed model achieves this by recognizing the malicious functions performed by the malware, which can be inferred from its API call sequences. Failure to detect these malware instances can result in severe cyberattacks, which pose a significant threat to the confidentiality, privacy, and availability of systems. We rely on a secondary dataset containing API call sequences, and we apply logistic regression to obtain the initial weight that serves as input to the neural network. By utilizing this hybrid approach, our research aims to address the challenges associated with traditional weight initialization techniques and to improve the accuracy and efficiency of malware detection based on API calls. The integration of both machine learning and deep learning algorithms allows the proposed model to capitalize on the strengths of each approach, potentially leading to a more robust and versatile solution to malware detection. Moreover, our research contributes to the ongoing efforts in the field of neural networks, by offering a novel perspective on weight initialization techniques and their impact on the performance of neural networks in the context of behavioral malware analysis. Experimental results using a balanced dataset showed 83% accuracy and a 0.44 loss, which outperformed the baseline model in terms of the minimum loss. The imbalanced dataset’s accuracy was 98%, and the loss was 0.10, which exceeded the state-of-the-art model’s accuracy. This demonstrates how well the suggested model can handle malware classification.

Full Text