Abstract

Classification of file fragments is a crucial step in digital forensics and determining file types based on available data fragments. Currently explored file fragment classification methods other than forensic hand-examination rely on machine learning techniques. Those methods most commonly use features based on byte frequency distribution as inputs in artificial neural networks. In this paper, some new approaches to file fragment classification are explored. Older MS Office file format files (doc, ppt, and xls), and the new MS Office format (docx, pptx, and xlsx), which were previously shown to be difficult to differentiate between, were joined into two separate higher-level classes due to similarities in the included files' structure. Different approaches to specifically differentiating between subtypes in each of those two higher-level classes are further explored in the paper. The results suggest small increases in classification accuracy can be achieved using the proposed approach.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.