A taint based approach for automatic reverse engineering of gray-box file formats

Baojiang Cui,Lingyu Wang,Yongle Hao,Fuwei Wang

doi:10.1007/s00500-015-1713-6

Abstract

File format vulnerabilities have been highlighted in recent years, and the performance of fuzzing tests relies heavily on the knowledge of target formats. In this paper, we present systematic algorithms and methods to automatically reverse engineer input file formats. The methodology employs dynamic taint analysis to reveal implicit relational information between input file and binary procedures, which is used for the measurement of correlations among data bytes, format segmentation and data type inference. We have implemented a prototype, and its general tests on 10 well-published binary formats yielded an average of over 85 % successful identification rate, while more detailed structural information was unveiled beyond coarse granular format analysis. Besides, a practical pseudo-fuzzing evaluation method is discussed in accordance with real-world demands on security analysis, and the evaluation results demonstrated the practical effectiveness of our system.

Full Text