Abstract

Signature-based malware detection approaches are inadequate for detecting the increasingly intelligent and large number of malware programs emerging today. Therefore, alternative approaches are required. The effects of malware can be estimated by analyzing the opcodes in its executable files. It can then be classified into families using a long short-term memory (LSTM) network. Vectorizing opcodes and application programming interface (API) function names using one-hot encoding results in high-dimensional vectors because each case is represented using one dimension. Therefore, this paper proposes a word2vec-based LSTM method to analyze opcodes and API function names using fewer dimensions. The results of opcode and API function name classification using the proposed method and one-hot encoding were compared using the Microsoft Malware Classification Challenge dataset. The proposed method showed approximately 0.5% higher performance than the one-hot encoding-based approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call