MalVulDroid: Tracing Vulnerabilities from Malware in Android using Natural Language Processing

Shivi Garg,Niyati Baliyan

doi:10.13052/jwe1540-9589.2185

Abstract

The Android operating system is often inflicted with mobile malware attacks, which occur due to some system loopholes or vulnerabilities. One malware can exploit numerous vulnerabilities and multiple malware can exploit a single vulnerability, thus, causing many-to-many ( X : Y ) mapping between malware and vulnerability. Therefore, it is crucial to understand malware behaviour to reduce the vulnerabilities. This paper presents the concept of a “MalVulDroid” framework that maps malware to vulnerabilities using a two-dimensional matrix. The many-to-many ( X : Y ) mapping matrix is obtained by using natural language processing techniques such as Bag-of-Words (BoW) leveraging n-gram probability generation and term frequency-inverse document frequency (TF-IDF), in addition to supervised machine learning classifiers such as multilayer perceptron (MLP), a support vector machine (SVM), a ripple down rule learner (RIDOR), and a pruning rule-based classification tree (PART). This study is the first of its kind where malware-to-vulnerability mapping can be leveraged to measure the rigorousness of unknown vulnerabilities and malware during the early phases of application development. The study considers extensive datasets such as Androzoo, AMD, and CICInvesAndMal2019 with 150 malware families and 48,907 malware samples, and nine major vulnerabilities affecting Android. MalVulDroid exhibits highly promising results with an accuracy of 98.04% for unigrams, and precision and F1-scores of over 90% using ensemble classifiers.

Full Text