Detection of malicious PE files using synthesized DNA artifacts

Sunday Cosmos Ngwobia,Anca Ralescu,David Kapp,Temesgen Kebede

doi:10.1016/j.cose.2023.103457

Abstract

The availability of sophisticated IT tools has provided computer system attackers with the capacity to develop dangerous metamorphic or polymorphic malware. Such malware presents behaviors similar to biological viruses enabling them to evade detection from conventional anti-malware methods. To circumvent this challenge, we carried out this research experiment to build and train machine learning models, and our proposed models achieved 99% & 99.6% accuracy in detecting and classifying PE-oriented malware. This excellent performance of our proposed models is undoubtedly a function of several factors but majorly on the synthesized DNA datasets (from our previous research works), which we used to train our proposed models. The synthesized DNA datasets are composed of salient features (dynamic and static) extracted from behaviors of nonmalicious programs (.exe, .dll, etc.) before and after infection by malware using reverse engineering and tracing tools. Then using a encoding algorithm and bioinformatics tools, we synthesized these features into a DNA-like representation (data sets) akin to the biological Deoxyribonucleic acid(DNA).

Full Text