Impact of datasets on machine learning based methods in Android malware detection: an empirical study

Xiuting Ge,Zhanwei Hui,Yifan Huang,Xiaojuan Wang,Xu Cao

doi:10.1109/qrs54544.2021.00019

Abstract

For Android malware detection, machine learning-based (ML-based) methods show promising performance. However, limited studies are performed to investigate the impact of factors related to datasets on ML-based methods, while the performance of ML-based methods dramatically relies on datasets. To partially bridge the gap, we conduct an empirical study to investigate the impact of factors related to datasets on ML-based Android malware detection methods. By investigating dataset differences between real-world scenarios and experimental settings, we summarize three dataset factors (i.e., class imbalance, quality, and timelines) and assess the impact of these factors on ML-based Android malware detection methods. We conduct experiments on more than 11K benign and 17K malicious applications. The results show that these three dataset factors yield significant biases in the existing ML-based Android malware detection methods. Based on these results, we learn some lessons about assessing ML-based Android malware detection methods when taking dataset factors into account.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Impact of datasets on machine learning based methods in Android malware detection: an empirical study

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Review of Healthcare Recommendation Systems Using Several Categories of Filtering and Machine Learning-Based Methods
Pardeep Kumar ... Ankit Kumar
-
Pardeep Kumar, et. al.Pardeep Kumar ... Ankit Kumar
04 Nov 2022
04 Nov 2022

Echocardiographic Left Ventricular Mass Assessment: Correlation between 2D-Derived Linear Dimensions and 3-Dimensional Automated, Machine Learning-Based Methods in Unselected Patients.
Andrea Barbieri ... Anna Maisano
Journal of Clinical Medicine | VOL. 10
Andrea Barbieri, et. al.Andrea Barbieri ... Anna Maisano
19 Mar 2021
Journal of Clinical Medicine | VOL. 10

An Evaluation of Machine Learning-Based Methods for Detection of Phishing Sites
Daisuke Miyamoto ... Youki Kadobayashi
-
Daisuke Miyamoto, et. al.Daisuke Miyamoto ... Youki Kadobayashi
01 Jan 2009
01 Jan 2009

JOWMDroid: Android malware detection based on feature weighting with joint optimization of weight-mapping and classifier parameters
Lingru Cai ... Zhi Xiong
Computers & Security | VOL. 100
Lingru Cai, et. al.Lingru Cai ... Zhi Xiong
16 Oct 2020
Computers & Security | VOL. 100

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Impact of datasets on machine learning based methods in Android malware detection: an empirical study

Abstract

Talk to us

Similar Papers