Abstract

Focused on the issue that the feature information extracted in malicious code cannot fully explain the behavior functionof malicious code, and the neural network used cannot extract spatial features and time series features at the same time, amalicious code classification method based on API sequences and text convolutional neural network Text-CNNwasproposed. Firstly, the method used the binary file analysis tool Angr to reversely analyze the malicious code binaryfile, obtained its data structure and control flow information, and automatically generated the control flow graph and functioncall graph. On this basis, an API call sequence extraction algorithm was proposed, which could generate the API call sequences according to the sequence of API functions used by malicious code. Secondly, an API call sequencevectorization model was established by using the word2vec model to vectorize the API call sequence, so that eachAPIfunction could obtain a vector representation of itself. Then, a malicious code API call sequence was transformedintoamalicious code API matrix, which was used as the input of the classification model. Finally, drawing on the idea of text classification, a malicious code classification model MM-Text-CNN was proposed. This model combinedone-dimensional convolution operation and two-dimensional convolution operation. It was not only suitable for input dataofdifferent sizes, but also can simultaneously extract spatial and temporal features of input data. The experimental resultsshowed that the classification model proposed in this paper can complete the malicious code classification task, andtheaccuracy rate could reach 97.83%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.