Anomaly Detection in File Fragment Classification of Image File Formats

Zahra Seyedghorban,Mehdi Teimouri

doi:10.1109/iccke54056.2021.9721457

Abstract

File fragment classification is an important and challenging problem in many domains. Among all the different file formats in computer networks and storage devices, image file formats are of much importance and interest. However, image formats form only a small portion of the datasets used in the previous works in the field of file fragment classification. Additionally, the proposed methods in the literature are concentrated on limited feature sets, which are suitable only for the task at hand. A challenge in the area of image type classification is to distinguish image file fragments from any other file format. To date, all presented approaches share the same hypothesis that each sample belongs to one of the training class labels. In most applications, this assumption is wrong, since a variety of file formats are transferred or being stored in a given environment. This proves the importance of discriminating image formats from other file types. In this paper, using a feature vector with length 577, we have classified image fragments of 10 different file formats. We have proposed to include an anomaly detection step before the classification phase to determine the non-image file fragments. Semi-supervised anomaly detection methods from other research domains are used as image discriminators. To evaluate the proposed method, three different anomaly detection methods are explained and designed. The result is evaluated with three different file format families considered anomalies: audio, video, and text.

Full Text