Abstract

Text detection is the primary task for digitization of ancient books. Different from the common scene text detection tasks (ICDAR, TotalText, etc.), the texts in handwritten ancient documents are more densely distributed and generally small objects; at the same time, the layout structure is also more complex, with problems such as mixed arrangement of pictures and texts and high background noise, all of which pose challenges for detection. According to the characteristics of ancient book images, this paper proposes a new fusion structure based on Feature Pyramid Networks, and takes FCOS as the baseline model to form a new detector (named RFCOS). We enhance the detection capability for dense and small text instances by adding bottom-up fusion paths, cross-layer connections and weighted fusion. Meanwhile, the loss of high-level feature maps during fusion is reduced by new upsampling method and lateral connections. We verified the effectiveness of our RFCOS on the HWAD (Handwritten Ancient Books Dataset), a dataset containing samples in four languages - Yi, Chinese, Tibetan and Tangut, and verify the generalization of RFCOS on another public dataset MTHv2. The results show that RFCOS outperformed most of the existing text detectors in terms of precision, recall and F-measure.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call