Abstract

AbstractMalware is a software capable of causing damage to computer systems. Conventional malware detection methods either require feature engineering to extract specific features or require a large amount of labeled data to train an end-to-end deep learning model. Both feature engineering and labelling are laborious. In this paper, we propose a semi-supervised contrastive learning malware detection method based on API call sequences with limited label information, called SCLMD. Specifically, a heterogeneous graph is constructed from API behavior to express the rich relationships among labeled and unlabeled software. After extracting the structural and sequential features of software by two encoders, we adopt the cross-view contrastive learning to obtain the shared and consistent feature of software. A hybrid positive selection strategy is designed to select positive pairs for contrastive learning by the guidance of the limited label information. Experimental results on two real world datasets show that the SCLMD outperforms the baseline methods, especially when the supervised information is limited.KeywordsMalware detectionContrastive learningHeterogeneous graph neural network

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.