Hierarchy-Based File Fragment Classification

Manish Bhatt,Rishav Rajendra,Md Tamjidul Hoque,Md Wasi Ul Kabir,S E Blake-Gatto,Avdesh Mishra,Irfan Ahmed

doi:10.3390/make2030012

Abstract

File fragment classification is an essential problem in digital forensics. Although several attempts had been made to solve this challenging problem, a general solution has not been found. In this work, we propose a hierarchical machine-learning-based approach with optimized support vector machines (SVM) as the base classifiers for file fragment classification. This approach consists of more general classifiers at the top level and more specialized fine-grain classifiers at the lower levels of the hierarchy. We also propose a primitive taxonomy for file types that can be used to perform hierarchical classification. We evaluate our model with a dataset of 14 file types, with 1000 fragments measuring 512 bytes from each file type derived from a subset of the publicly available Digital Corpora, the govdocs1 corpus. Our experiment shows comparable results to the present literature, with an average accuracy of 67.78% and an F1-measure of 65% using 10-fold cross-validation. We then improve on the hierarchy and find better results, with an increase in the F1-measure of 1%. Finally, we make our assessment and observations, then conclude the paper by discussing the scope of future research.

Highlights

It is essential for a forensic investigator to be able look at an artifact, which can be a network packet or a piece of data, and readily recognize what kind of data it is
Extr. 2020, 2 approach by using support vector machines (SVM) as our base classifier. We find that this approach, unrefined, opens up a different way of looking at the file fragment classification problem
Upon optimizing the SVM parameters using grid search, we found that we got the best results for the two parameters of the Radial Basis Function (RBF)

Summary

Introduction

It is essential for a forensic investigator to be able look at an artifact, which can be a network packet or a piece of data, and readily recognize what kind of data it is. In the machine learning description of the problem, each file type is thought to be a category (class) and certain features that are thought to characterize the file fragment are extracted. We propose a classification technique called hierarchical classification to classify file fragments without the help of file signatures present in headers and footers. We use the hierarchical classification technique for 14 different file types by taking support vector machines (SVM) [21] as our base classifiers to classify file fragments. 2020, 2 approach by using SVM as our base classifier We find that this approach, unrefined, opens up a different way of looking at the file fragment classification problem. We compare our results with the existing techniques which have been proposed in the literature, conclude the paper, and describe future works

File Fragment Classification

Hierarchical Classification

Hierarchy Definition

Feature Descriptions

Unigram Count Distribution

Entropy and Bigram Distribution

Mean Byte Value

3.12. Precision

Experiment Details

Evaluation Metrics

Comparison with Previous Works and Discussion

Conclusions and Future Works

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Machine Learning and Knowledge Extraction	Publication Date: Aug 3, 2020
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Hierarchy-Based File Fragment Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning and Knowledge Extraction

Lead the way for us

Similar Papers

PD Pattern Recognition of XLPE Cable Based on parameter optimal Support Vector Machine Algorithm
Yubing Duan ... Along Jin
-
Yubing Duan, et. al.Yubing Duan ... Along Jin
01 Jun 2019
01 Jun 2019

A File Fragment Classification Method Based on Grayscale Image
Tantan Xu ... Ning Zheng
Journal of Computers | VOL. -
Tantan Xu, et. al.Tantan Xu ... Ning Zheng
08 Jan 2014
Journal of Computers | VOL. -

Optimal SVM with Features for MIR from Multi-Language
...
The International Journal of Innovative Technology and Exploring Engineering | VOL. 9
, et. al. ...
30 Jun 2020
The International Journal of Innovative Technology and Exploring Engineering | VOL. 9

File fragment recognition based on content and statistical features
Marzieh Masoumi ... Reza Fotohi
Multimedia Tools and Applications | VOL. 80
Marzieh Masoumi, et. al.Marzieh Masoumi ... Reza Fotohi
20 Feb 2021
Multimedia Tools and Applications | VOL. 80

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hierarchy-Based File Fragment Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning and Knowledge Extraction