Abstract

AbstractThe classification of malware families is based on the similarity within the malware family, including the similarity of program structure and content. Since the control flow graph belongs to non-Euclidean structured data, it is difficult to directly use the feature extracted from its data and structure for classifying before. However, with the proposal of graph neural network, non-Euclidean graph’s classification become possible. We propose a malware family classification system based on control flow graph and Term Frequency-Inverse Document Frequency. In this system, both the control flow graph branch structure and the instruction sequence in basic blocks are treated as input, and the graph feature representation of the malware family is generated through the graph neural network. The experimental results on the Microsoft Malware Classification Challenge dataset show that retaining the feature data of the graph structure can effectively improve the effect of family classification. And the effect can also be improved through the instruction features based on TF-IDF.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.