File Fragment Classification Using Grayscale Image Conversion and Deep Learning in Digital Forensics

Qian Chen,Zoe L Jiang,Zhengzhong Yi,Dong Liu,Lucas C.K Hui,Junbin Fang,Guikai Xi,Qing Liao,En Zhang,Xuan Wang,Rong Li,Siuming Yiu

doi:10.1109/spw.2018.00029

Abstract

File fragment classification is an important step in digital forensics. The most popular method is based on traditional machine learning by extracting features like N-gram, Shannon entropy or Hamming weights. However, these features are far from enough to classify file fragments. In this paper, we propose a novel scheme based on fragment-to-grayscale image conversion and deep learning to extract more hidden features and therefore improve the accuracy of classification. Benefit from the multi-layered feature maps, our deep convolution neural network (CNN) model can extract nearly ten thousands of features through the non-linear connections between neurons. Our proposed CNN model was trained and tested on the public dataset GovDocs. The experiments results show that we can achieve 70.9% accuracy in classification, which is higher than those of existing works.

Full Text