Abstract

In this paper, lateral feature enhancement (LFE) backbone network is proposed to enrich feature representation effectively for page object detection across various scales. Our LFE backbone network has three feature enhancement modules. Firstly, feature enhancement of large page object is a bottom-up feature pyramid, enhancing features of large page objects which convey more important information to readers. Secondly, lateral feature enhancement includes a top-down feature pyramid propagating representative semantical features to lower layers, and a lateral connection for feature enhancement in each layer. Thirdly, lateral skip connection is designed to retain the original feature details. The stacking strategies of bottom-up, top-down and lateral connections are beneficial for overall object detection. Visualization of feature indicates that the proposed LFE backbone network enhances global semantic information as well as detailed features of small page objects. Comparative experiments on the two state-of-the-art datasets show that it achieves excellent results with 0.950 mAP on PubLayNet and 0.892 mAP on POD with more strict metric IoU=0.8 respectively. Compared with both computer vision (CV) based unimodal detectors and multi-modal detectors, the proposed LFE network performs excellently. Visual effect experiments compare performances of CV-based detectors. The results show our detector outperforms others with strict metric, especially in the detection of small page objects.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call