Abstract

File fragment classification is an important step in digital forensics. The most popular method is based on traditional machine learning by extracting features like N-gram, Shannon entropy or Hamming weights. However, these features are far from enough to classify file fragments. In this paper, we propose a novel scheme based on fragment-to-grayscale image conversion and deep learning to extract more hidden features and therefore improve the accuracy of classification. Benefit from the multi-layered feature maps, our deep convolution neural network (CNN) model can extract nearly ten thousands of features through the non-linear connections between neurons. Our proposed CNN model was trained and tested on the public dataset GovDocs. The experiments results show that we can achieve 70.9% accuracy in classification, which is higher than those of existing works.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call