In recent years, digital copies of paper documents are used widely with the prevalence of various online services. As a result, it is critical to validate the authenticity of the uploaded document images to protect against attacks from malicious users. Out of various types of attacks, the recapture attack (by reprinting and recapturing) is effective in concealing the trace of document forgeries. However, detecting the recaptured document images is challenging. To address this problem, we first study the halftone cell distortion introduced in both the genuine and recaptured document images. Based on our study, a unified model that characterizes the halftone cell distortion (e.g., errors in size and displacement) is then proposed for accurate estimation of the distortion parameters. The statistics of the estimated parameters are then exploited in a hypothesis testing framework to detect the recaptured document images. The questioned document image can be authenticated by testing against the null hypothesis, i.e., the image is a genuine sample. To evaluate the performance of the proposed approach under different application scenarios, extensive experiments are conducted with different prior knowledge of printers (known printer model, known printing technique, and In-The-Wild (unknown printing device and document contents)). The experiment results show that the proposed approach outperforms the data-driven benchmark approaches by a significant margin. Specifically, under the In-The-Wild experiment protocol, the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) of the proposed approach is above 0.87 while the AUC of the benchmark approaches (even some utilize both genuine and recaptured samples) degrades to less than 0.77.
Read full abstract