Abstract

Mobile malware detection has attracted considerable attention because people are becoming increasingly inseparable from their mobile devices. Some researchers have endeavored to discover traces of malicious apps by harnessing the semantics of network traffic. However, network traffic data naturally exhibit imbalanced distributions. To date, studies that consider such imbalanced problem have been limited. Thus, we propose an oversampling method that can synthesize malicious instances based on a signature from the perspective of nominal features for mobile malware detection problem. First, we cluster the malicious network traffic datasets and generate signatures from each cluster. Second, the signatures are used to synthesize new malicious instances. The synthetic instances combined with instances obtained through the random oversampling method are used to enrich the minority class dataset. Finally, a semantic-based mobile malware detection model is built by using the rebalanced datasets. Our approach is compared with 10 other resampling algorithms on two network traffic datasets with different imbalanced ratios. Results show that the combination of the proposed method and the random oversampling algorithm has superior performance to the 10 other resampling algorithms for imbalanced malware detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call