A two-stage method for text line detection in historical documents

Tobias Grüning,Gundram Leifert,Johannes Michael,Roger Labahn,Tobias Strauß

doi:10.1007/s10032-019-00332-1

Tobias Grüning, Gundram Leifert + Show 3 more

Open Access

https://doi.org/10.1007/s10032-019-00332-1

Copy DOI

Abstract

This work presents a two-stage text line detection method for historical documents. Each detected text line is represented by its baseline. In a first stage, a deep neural network called ARU-Net labels pixels to belong to one of the three classes: baseline, separator or other. The separator class marks beginning and end of each text line. The ARU-Net is trainable from scratch with manageably few manually annotated example images (less than 50). This is achieved by utilizing data augmentation strategies. The network predictions are used as input for the second stage which performs a bottom-up clustering to build baselines. The developed method is capable of handling complex layouts as well as curved and arbitrarily oriented text lines. It substantially outperforms current state-of-the-art approaches. For example, for the complex track of the cBAD: ICDAR2017 Competition on Baseline Detection the F-value is increased from 0.859 to 0.922. The framework to train and run the ARU-Net is open source.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A two-stage method for text line detection in historical documents

Abstract

Talk to us

Similar Papers

More From: International Journal on Document Analysis and Recognition (IJDAR)

Lead the way for us

Journal: International Journal on Document Analysis and Recognition (IJDAR)	Publication Date: Jul 23, 2019
Citations: 109

Similar Papers

Text Line Extraction Using Fully Convolutional Network and Energy Minimization
Berat Kurar Barakat ... Jihad El-Sana
-
Berat Kurar Barakat, et. al.Berat Kurar Barakat ... Jihad El-Sana
01 Jan 2020
01 Jan 2020

A Text Line Detection Method for Mathematical Formula Recognition
Xiaoyan Lin ... Zhi Tang
-
Xiaoyan Lin, et. al.Xiaoyan Lin ... Zhi Tang
01 Aug 2013
01 Aug 2013

A method for text line detection in natural images
Jie Yuan ... Baogang Wei
Multimedia Tools and Applications | VOL. 74
Jie Yuan, et. al.Jie Yuan ... Baogang Wei
27 Sep 2013
Multimedia Tools and Applications | VOL. 74

A Robust and Binarization-Free Approach for Text Line Detection in Historical Documents
Tobias Gruuening ... Gundram Leifert
-
Tobias Gruuening, et. al.Tobias Gruuening ... Gundram Leifert
01 Nov 2017
01 Nov 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A two-stage method for text line detection in historical documents

Abstract

Talk to us

Similar Papers

More From: International Journal on Document Analysis and Recognition (IJDAR)