Dataset agnostic document object detection

Ajoy Mondal,Madhav Agarwal,C.V Jawahar

doi:10.1016/j.patcog.2023.109698

Ajoy Mondal, Madhav Agarwal + Show 1 more

Open Access

https://doi.org/10.1016/j.patcog.2023.109698

Copy DOI

Abstract

Localizing document objects such as tables, figures, and equations is a primary step for extracting information from document images. We propose a novel end-to-end trainable deep network, termed Document Object Localization Network (dolnet), for detecting various objects present in the document images. The proposed network is a multi-stage extension of Mask r-cnn with a dual backbone having deformable convolution for detecting document objects with high detection accuracy at a higher IoU threshold. We also empirically evaluate the proposed dolnet on the publicly available benchmark datasets. The proposed DOLNet achieves state-of-the-art performance for most of the bench-mark datasets under various existing experimental environments.Our solution has three important properties: (i) a single trained model dolnet‡ that performs well across all the popular benchmark datasets, (ii) reports excellent performances across multiple, including with higher IoU thresholds, and (iii) consistently demonstrate the superior quantitative performance by following the same protocol of the recent works for each of the benchmarks.

Full Text