Malware Classification Using Dynamically Extracted API Call Embeddings

Sahil Aggarwal,Fabio Di Troia

doi:10.3390/app14135731

Abstract

Malware classification stands as a crucial element in establishing robust computer security protocols, encompassing the segmentation of malware into discrete groupings. Recently, the emergence of machine learning has presented itself as an apt approach for addressing this challenge. Models can undergo training employing diverse malware attributes, such as opcodes and API calls, to distill valuable insights for effective classification. Within the realm of natural language processing, word embeddings assume a pivotal role by representing text in a manner that aligns closely with the proximity of similar words. These embeddings facilitate the quantification of word resemblances. This research embarks on a series of experiments that harness hybrid machine learning methodologies. We derive word vectors from dynamic API call logs associated with malware and integrate them as features in collaboration with diverse classifiers. Our methodology involves the utilization of Hidden Markov Models and Word2Vec to generate embeddings from API call logs. Additionally, we amalgamate renowned models like BERT and ELMo, noted for their capacity to yield contextualized embeddings. The resultant vectors are channeled into our classifiers, namely Support Vector Machines (SVMs), Random Forest (RF), k-Nearest Neighbors (kNNs), and Convolutional Neural Networks (CNNs). Through two distinct sets of experiments, our objective revolves around the classification of both malware families and categories. The outcomes achieved illuminate the efficacy of API call embeddings as a potent instrument in the domain of malware classification, particularly in the realm of identifying malware families. The best combination was RF and word embeddings generated by Word2Vec, ELMo, and BERT, achieving an accuracy between 0.91 and 0.93. This result underscores the potential of our approach in effectively classifying malware.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Malware Classification Using Dynamically Extracted API Call Embeddings

Abstract

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Journal: Applied Sciences	Publication Date: Jun 30, 2024
License type: CC BY 4.0

Similar Papers

Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo
Aparna Sunil Kale ... Mark Stamp
Journal of Computer Virology and Hacking Techniques | VOL. 19
Aparna Sunil Kale, et. al.Aparna Sunil Kale ... Mark Stamp
22 Apr 2022
Journal of Computer Virology and Hacking Techniques | VOL. 19

An ensemble of pre-trained transformer models for imbalanced multiclass malware classification
Ferhat Demirkıran ... Hasan Dağ
Computers & Security | VOL. 121
Ferhat Demirkıran, et. al.Ferhat Demirkıran ... Hasan Dağ
27 Jul 2022
Computers & Security | VOL. 121

Malware Classification with Word Embedding Features
Aparna Kale ... Mark Stamp
-
Aparna Kale, et. al.Aparna Kale ... Mark Stamp
01 Jan 2020
01 Jan 2020

MalClassifier: Malware family classification using network flow sequence behaviour
Bushra A Alahmadi ... Ivan Martinovic
-
Bushra A Alahmadi, et. al.Bushra A Alahmadi ... Ivan Martinovic
01 May 2018
01 May 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Malware Classification Using Dynamically Extracted API Call Embeddings

Abstract

Talk to us

Similar Papers

More From: Applied Sciences