Dynamic behavior-based malware analysis and detection is considered to be one of the most promising ways to combat with the obfuscated and unknown malwares. To perform such analysis, behavioral feature abstraction plays a fundamental role, because how to specify program formally to a large extend determines what kind of algorithm can be used. In existing research, graph-based methods keep a dominant position in specifying malware behaviors. However, they restrict the detection algorithm to be chosen from graph mining algorithm. In this paper, we build a complete virtual environment to capture malware behaviors, especially that to stimulate network behaviors of a malware. Then, we study the problem of abstracting constant behavioral features from API call sequences and propose a minimal security-relevant behavior abstraction way, which absorbs the advantages of prevalent graph-based methods in behavior representation and has the following advantages: first API calls are aggregated by data dependence, therefore it is resistent to redundant data and is a kind of more constant feature. Second, API call arguments are also abstracted particularly, this further contributes to common and constant behavioral features of malware variants. Third, it is a moderate degree aggregation of a small group of API calls with a constructing criterion that centering on an independent operation on a sensitive resource. Fourth, it is very easy to embed the extracted behaviors in a high dimensional vector space, so that it can be processed by almost all of the prevalent statistical learning algorithms. We then evaluate these minimal security-relevant behaviors in three kinds of test, including similarity comparison, clustering and classification. The experimental results show that our method has a capacity in distinguishing malwares from different families and also from benign programs, and it is useful for many statistical learning algorithms.
Read full abstract