Pixel-based Feature for Android Malware Family Classification using Machine Learning Algorithms

Mohd Zamri Osman,Mohd Faaizie Darmawan,Rahiwan Nazar Romli,Ahmad Firdaus Zainal Abidin

doi:10.1109/icsecs52883.2021.00107

Abstract

‘Malicious software’ or malware has been a serious threat to the security and privacy of all mobile phone users. Due to the popularity of smartphones, primarily Android, this makes them a very viable target for spreading malware. In the past, many solutions have proved ineffective and have resulted in many false positives. Having the ability to identify and classify malware will help prevent them from spreading and evolving. In this paper, we study the effectiveness of the proposed classification of the malware family using a pixel level as features. This study has implemented well-known machine learning and deep learning classifiers such as K-Nearest Neighbours (k-NN), Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree, and Random Forest. A binary file of 25 malware families is converted into a fixed grayscale image. The grayscale images were then extracted transforming the size 100x100 into a single format into 100000 columns. During this phase, none of the columns are removed as to remain the patterns in each malware family. The experimental results show that our approach achieved 92% accuracy in Random Forest, 88% in SVM, 81% in Decision Tree, 80% in k-NN and 56% in Naïve Bayes classifier. Overall, the pixel-based feature also reveals a promising technique for identifying the family of malware with great accuracy, especially using the Random Forest classifier.

Full Text