The impact of data fragment sizes on file type recognition

Khoa Nguyen,Dharmendra Sharma,Dat Tran,Wanli Ma

doi:10.1109/icnc.2014.6975930

Abstract

Determining the original file type of data fragments helps data recovery, spam detection, virus scanning, and network monitoring operations. In many cases, only unordered fragments of the original file are available for investigation. Therefore, we can only base on the content of a fragment to identify its file type. However, data fragments come with different sizes, as they may be the residual data recovered from storage media or network packets. It is stated that identifying the file type of larger fragments is easier than the smaller size ones [1]. Therefore, it is important to study the impact of data fragment sizes on file type recognition. In this paper, we study the results of applying machine learning technique to identify file types of data fragments of different sizes in order to find the minimum size required for file type recognition purpose.

Full Text