Abstract
BackgroundDocument images such as statistical reports and scientific journals are widely used in information technology. Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction. However, because of the diversity in the shapes and sizes of tables, existing table detection methods adapted from general object detection algorithms, have not yet achieved satisfactory results. Incorrect detection results might lead to the loss of critical information. MethodsTherefore, we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections. To better deal with table areas of different shapes and sizes, we added a dual-branch context content attention module (DCCAM) to high-dimensional features to extract context content information, thereby enhancing the network's ability to learn shape features. For feature fusion at different scales, we replaced the original 3×3 convolution with a multilayer residual module, which contains enhanced gradient flow information to improve the feature representation and extraction capability. ResultsWe evaluated our method on public document datasets and compared it with previous methods, which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score. https://github.com/YongZ-Lee/TD-DCCAM
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have