Abstract

Android malware detection is a serious issue for mobile security. Recent machine learning-based research could achieve high accuracy. However, there are far more unlabeled samples in the application scenario, while most studies must depend on labeled data for training. This paper proposed, as a solution, a framework based on contrastive learning that attempted to reduce the impact of past knowledge and pretrain the model without the participation of labels. The result indicated that the method could reach an accuracy above 96% for malware identification on public datasets and above 98% for multiclass detection (malware class and family). In addition, the model could have higher performance than the same supervised model under the condition of a small proportion of labeled samples. Comparable performance to supervised approaches validated the feasibility of contrastive learning on malware detection by demonstrating its viability. Nonetheless, there are still gaps to be overcome in the future.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call