Abstract

Traditional malware detection methods cannot keep up with the massive amount of newly created malware quickly and effectively. Machine learning is a promising method for the detection and classification of large-scale newly created malware according to the features of samples. The current research trend is to use machine learning technology, such as the Gradient Boosting Decision Tree (GBDT) and deep neural network technology, to learn newly created malware rapidly and accurately. We propose Control-Flow Graph (CFG)- and Graph Isomorphism Network (GIN)-based malware classification, where we first extract the CFG from portable executable (PE) files and use the large-scale pre-training language model MiniLM to generate the node features of CFG. The extracted CFG is compressed to a feature vector with GIN and classified with Multi-Layer Perceptron. To evaluate our approach, we made a CFG-based malware detection dataset from PE files of the Dike Dataset, which we call the Malware Geometric Dataset (MGD), and collected the results. The evaluation results show that our proposal demonstrated 0.9977 in the Area Under Curve metric and achieved a 97.44 % detection rate when the False Positive Rate was 0.1 %.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.