MalJPEG: Machine Learning Based Solution for the Detection of Malicious JPEG Images

Aviad Cohen,Yuval Elovici,Nir Nissim

doi:10.1109/access.2020.2969022

Aviad Cohen, Yuval Elovici + Show 1 more

Open Access

PDF Available

https://doi.org/10.1109/access.2020.2969022

Copy DOI

Export

Save

Cite

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 31	License type: CC BY 4.0

Affiliation: Ben-Gurion University of the Negev

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

In recent years, cyber-attacks against individuals, businesses, and organizations have increased. Cyber criminals are always looking for effective vectors to deliver malware to victims in order to launch an attack. Images are used on a daily basis by millions of people around the world, and most users consider images to be safe for use; however, some types of images can contain a malicious payload and perform harmful actions. JPEG is the most popular image format, primarily due to its lossy compression. It is used by almost everyone, from individuals to large organizations, and can be found on almost every device (on digital cameras and smartphones, websites, social media, etc.). Because of their harmless reputation, massive use, and high potential for misuse, JPEG images are used by cyber criminals as an attack vector. While machine learning methods have been shown to be effective at detecting known and unknown malware in various domains, to the best of our knowledge, machine learning methods have not been used particularly for the detection of malicious JPEG images. In this paper, we present MalJPEG, the first machine learning-based solution tailored specifically at the efficient detection of unknown malicious JPEG images. MalJPEG statically extracts 10 simple yet discriminative features from the JPEG file structure and leverages them with a machine learning classifier, in order to discriminate between benign and malicious JPEG images. We evaluated MalJPEG extensively on a real-world representative collection of 156,818 images which contains 155,013 (98.85%) benign and 1,805 (1.15%) malicious images. The results show that MalJPEG, when used with the LightGBM classifier, demonstrates the highest detection capabilities, with an area under the receiver operating characteristic curve (AUC) of 0.997, true positive rate (TPR) of 0.951, and a very low false positive rate (FPR) of 0.004.

Highlights

Cyber attacks targeting individuals, businesses, and organizations have increased in recent years
We present MalJPEG, a machine learningbased solution for efficient detection of unknown malicious JPEG images
The benign images were collected from social media (Facebook, Instagram, WhatsApp, etc.); we focus on viral images of different file sizes and on different topics

Summary

Introduction

Businesses, and organizations have increased in recent years. Infosecurity magazine declared that cyber attacks doubled in 2017.1 Cyber attacks usually include harmful activities such as stealing confidential information, spying, or monitoring, and cause harm (sometimes significant) to the victim. Attackers may be motivated by ideology, criminal intent, a desire for publicity, etc. Some non-executable files allow an attacker to run arbitrary malicious code on the targeted victim machine when the file is opened. We provide background material related to our research, as well as technical information regarding the structure of a JPEG image. Since the JPEG file structure is complicated, we only present the basic information needed to enable the reader to comprehend the paper and understand the proposed MalJPEG solution presented in this research. JPEG files usually have a filename extension of ∗.jpg or ∗.jpeg

Methods

Results

Conclusion