Abstract

Signature-based malware detection approaches are inadequate for detecting the increasingly intelligent and large number of malware programs emerging today. Therefore, alternative approaches are required. The effects of malware can be estimated by analyzing the opcodes in its executable files. It can then be classified into families using a long short-term memory (LSTM) network. Vectorizing opcodes and application programming interface (API) function names using one-hot encoding results in high-dimensional vectors because each case is represented using one dimension. Therefore, this paper proposes a word2vec-based LSTM method to analyze opcodes and API function names using fewer dimensions. The results of opcode and API function name classification using the proposed method and one-hot encoding were compared using the Microsoft Malware Classification Challenge dataset. The proposed method showed approximately 0.5% higher performance than the one-hot encoding-based approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.