A Comparative Analysis of Classifiers in the Recognition of Packed Executables

Cecilia R O Assis,Kil J B Park,Rodrigo S Miani,Murillo G Carneiro

doi:10.1109/ictai.2019.00189

Abstract

Although the packing of executable binaries can be adopted with legitimate intent such as intellectual property protection and size reduction, malware developers utilize those tools to obfuscate their code and thus increase the complexity of static analysis. In order to recognize packed executables, the BinStat application was proposed. It is based on two major steps: the feature extraction, which involves the calculation of statistics and information theory properties from a given binary; and the classification, which adopts a decision tree learned from input features of packed and unpacked binaries previously known in order to classify new executables. The results obtained proved the effectiveness of the tool, but the choice of using only one classifier is arguably a weakness that we chose to improve on the present study. For that end, we rebuilt the training and test datasets and selected the following six classifiers to our analyses: classification and regression trees, random forest, k-nearest neighbors, naive Bayes, neural network and support vector machines. Our results show that the original decision tree algorithm adopted in BinStat (C5.0) is not the best choice for the proposed problem. Indeed, random forest, k-nearest neighbors and support vector machines achieved the best predictive performances.

Full Text