Abstract

Malware API call graph derived from API call sequences is considered as a representative technique to understand the malware behavioral characteristics. However, it is troublesome in practice to build a behavioral graph for each malware. To resolve this issue, we examine how to generate a simple behavioral graph that characterizes malware. In this paper, we introduce the use of word embedding to understand the contextual relationship that exists between API functions in malware call sequences. We also propose a method that segregating individual functions that have similar contextual traits into clusters. Our experimental results prove that there is a significant distinction between malware and goodware call sequences. Based on this distinction, we introduce a new method to detect and predict malware based on the Markov chain. Through modeling the behavior of malware and goodware API call sequences, we generate a semantic transition matrix which depicts the actual relation between API functions. Our models return an average detection precision of 0.990, with a false positive rate of 0.010. We also propose a prediction methodology that predicts whether an API call sequence is malicious or not from the initial API calling functions. Our model returns an average accuracy for the prediction of 0.997. Therefore, we propose an approach that can block malicious payloads instead of detecting them after their post-execution and avoid repairing the damage.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call