One of the big challenges in cybersecurity is the detection of Android attacks since Android is the most popular mobile operating system. Within this system, applications require certain permissions to access critical resources. Investigation of the use of permissions is a concern to check whether an application is not mislead to divulge sensitive information. This work aims to determine whose distance schemes offers the best malware detection based on permission similarities. Case based reasoning (CBR) is a concept which aims to find a solution based on historical experiences. CBR performance relies on finding similarities between actual cases and stored cases and then to deduce solutions. This paper proposes to transform app data as vector of appearance of dangerous permissions and to store such vectors based on CBR structure. Then we investigate k-NN classification performance related to five distance-based metrics such as Euclidean, Cosine, Manhattan, Minkowski, and Mahalanobis. Experiments were carried out with a set of 419 applications, including 203 malicious and 216 benign samples. The whole dataset has been split in training set of 291 samples with 162 benign and 129 malicious, and the testing set of 128 samples with 54 benign and 74 malicious samples. k-Nearest Neighbor (k-NN) are used as the similarity algorithm in which the distance model is varied in each of the five distance models. Results reveal that Minkowski and Manhattan models provide the best overall performance to detect Android malware based on permissions, in terms of accuracy (99.21%) and precision (97%). This work is a good start to recommend to researchers distance metrics exploitable when performing permission similarity-based detection.
Read full abstract