Abstract

Malware API call graph derived from API call sequences is considered as a representative technique to understand the malware behavioral characteristics. However, it is troublesome in practice to build a behavioral graph for each malware. To resolve this issue, we examine how to generate a simple behavioral graph that characterizes malware. In this paper, we introduce the use of word embedding to understand the contextual relationship that exists between API functions in malware call sequences. We also propose a method that segregating individual functions that have similar contextual traits into clusters. Our experimental results prove that there is a significant distinction between malware and goodware call sequences. Based on this distinction, we introduce a new method to detect and predict malware based on the Markov chain. Through modeling the behavior of malware and goodware API call sequences, we generate a semantic transition matrix which depicts the actual relation between API functions. Our models return an average detection precision of 0.990, with a false positive rate of 0.010. We also propose a prediction methodology that predicts whether an API call sequence is malicious or not from the initial API calling functions. Our model returns an average accuracy for the prediction of 0.997. Therefore, we propose an approach that can block malicious payloads instead of detecting them after their post-execution and avoid repairing the damage.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.