Abstract
Degraded document binarization has received keen attention due to its vital influence on subsequent document analysis tasks. In this study, we propose a novel Degraded Document Binarization model through the vision transFormer framework, termed D2BFormer. Thanks to its end-to-end trainable fashion, the D2BFormer model is able to autonomously optimize its parameterized configuration of the entire learning pipeline without incurring the intensity-to-binary value conversion phase, resulting in an improved binarization quality. In addition, we propose a novel dual-branched encoding feature fusion module, which combines architectural components from the vision transformer framework and deep convolutional neural networks. The resulting encoding module can extract features from an input document that are sensitive to both global and local characteristics. Meanwhile, the proposed encoding feature extraction module can operate internally at a much lower spatial resolution than that of a raw input document, leading to reduced computational complexity. Furthermore, we propose a novel progressively merged decoding feature fusion module through carefully introduced skip connections both inside and outside the decoding network. The resulting decoding module progressively combines counterpart features derived from the corresponding layers of the encoding network with comparable spatial resolutions and up-sampled features generated from previous layers in the decoding network. Finally, the experiments conducted on ten public datasets demonstrate that the proposed D2BFormer model gains promising performance in terms of four metrics.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.