Detection Method for Classifying Malicious Firmware

David Noever,Samantha E Miller Noever

doi:10.5121/ijnsa.2021.13601

Abstract

A malicious firmware update may prove devastating to the embedded devices both that make up the Internet of Things (IoT) and that typically lack the same security verifications now applied to full operating systems. This work converts the binary headers of 40,000 firmware examples from bytes into 1024-pixel thumbnail images to train a deep neural network. The aim is to distinguish benign and malicious variants using modern deep learning methods without needing detailed functional or forensic analysis tools. One outcome of this image conversion enables contact with the vast machine learning literature already applied to handle digit recognition (MNIST). Another result indicates that greater than 90% accurate classifications prove possible using image-based convolutional neural networks (CNN) when combined with transfer learning methods. The envisioned CNN application would intercept firmware updates before their distribution to IoT networks and score their likelihood of containing malicious variants. To explain how the model makes classification decisions, the research applies traditional statistical methods such as both single and ensembles of decision trees with identifiable pixel or byte values that contribute the malicious or benign determination.

Highlights

IntroductionImage classifiers represent a novel approach to abstracting small differences in program executables, for closely matched cases where human or rule-based inspections fail
We have previously found this approach useful to understand the image classification for both malware (V-MNIST) [18] and intrusion detection [19]
The small (32x32) grayscale images match with a decimal conversion (0-15) of the raw binary and are scaled to a wider (0-255) pixel value range

Summary

Introduction

Image classifiers represent a novel approach to abstracting small differences in program executables, for closely matched cases where human or rule-based inspections fail. We convert a common executable format from raw bytes to decimal (0-15), scale this identifying image into 256 greyscale pixel values (Figure 1). This process of transforming compiled bytes to images extends previous breakthroughs in computer vision and promises continued enhancement as more sophisticated deep learning methods advance.

Methods

Results

Conclusion