Malicious software deliberately affects the computer systems. Malware are analyzed using static or dynamic analysis techniques. Using these techniques, unique patterns are extracted to detect malware correctly. In this paper, a behavior-based malware detection technique is proposed. Various runtime features are extracted by setting up a dynamic analysis environment using the Cuckoo sandbox. Three primary features are processed for developing malware classifier. Firstly, printable strings are processed word by word using text mining techniques which produced a very high dimension matrix of the string features. Then we apply the singular value decomposition technique for reducing dimensions of string features. Secondly, Shannon entropy is computed over the printable strings and API calls to consider the randomness of API and PSI features. In addition to these features, behavioral features regarding file operations, registry key modification and network activities are used in malware detection. Finally, all features are integrated in the training feature set to develop the malware classifiers using the machine learning algorithms. The proposed technique is validated with 16489 malware and 8422 benign files. Our experimental results show the accuracy of 99.54% in malware detection using ensemble machine learning algorithms. Moreover, it aims to develop a behavior-based malware detection technique of high accuracy by processing the runtime features in a new way.
Read full abstract