Abstract

Due to its severe damages and threats to the security of the Internet and computing devices, malware detection has caught the attention of both anti-malware industry and researchers for decades. To combat the evolving malware attacks, in this paper, we first study how to utilize both content- and relation-based features to characterize sly malware; to model different types of entities (i.e., file, archive, machine, API, DLL ) and the rich semantic relationships among them (i.e., file-archive, file-machine, file-file, API-DLL, file-API relations), we then construct a structural heterogeneous information network (HIN) and present meta-graph based approach to depict the relatedness over files. To measure the relatedness over files on the constructed HIN, since malware detection is a cost-sensitive task, it calls for efficient methods to learn latent representations for HIN. To address this challenge, based on the built meta-graph schemes, we propose a new HIN embedding model metagraph2vec on the first attempt to learn the low-dimensional representations for the nodes in HIN, where both the HIN structures and semantics are maximally preserved for malware detection. A comprehensive experimental study on the real sample collections from Comodo Cloud Security Center is performed to compare various malware detection approaches. The promising experimental results demonstrate that our developed system Scorpion which integrate our proposed method outperforms other alternative malware detection techniques. The developed system has already been incorporated into the scanning tool of Comodo Antivirus product.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call