Semi-supervised learning for text-line detection

Zongyi Liu,Hanning Zhou,Ning Yang

doi:10.1016/j.patrec.2010.03.015

Abstract

Automatically detecting text-lines from document images has been long studied. However, most researchers today are focusing on boosting the detection rate instead of noise removal. In this paper, we propose a semi-supervised learning framework that targets to segment Manhattan-layout documents with significant levels of noise. The algorithm consists of three steps: first, an initial segmentation process uses the seed filling algorithm; second, an iterative grouping process uses the projection profiles to estimate the vertical border of page contents; third, an inside page-content noise removal uses the online training and classification. We test our algorithm using two databases. The first is the University of Washington (UW)-III database with 1,600 images of different input qualities that has been widely used by the Document Analysis Research (DAR) communities to measure segmentation algorithm performance. The second is the NILE database created by sampling from 320 journals pages of east Asian, east European and middle Eastern languages. The result shows that our framework achieves competitive performance in terms of both page frame level segmentation and text-line level segmentation, and is particularly strong at filtering noise. It also shows that our algorithm is more adaptive to language variations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semi-supervised learning for text-line detection

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters

Lead the way for us

Journal: Pattern Recognition Letters	Publication Date: Apr 18, 2010
Citations: 8

Similar Papers

A FRAMEWORK FOR ANALYZING THE INFERENCE STRUCTURE OF EDUCATIONAL ACHIEVEMENT TESTS
James L Wardrop ... Thomas H Anderson
Journal of Educational Measurement | VOL. 19
James L Wardrop, et. al.James L Wardrop ... Thomas H Anderson
01 Mar 1982
Journal of Educational Measurement | VOL. 19

Attraction and Isolation: The Past and Future of East Asian Languages and Cultures
Haruo Shirane
Profession | VOL. 2003
Haruo ShiraneHaruo Shirane
01 Dec 2003
Profession | VOL. 2003

Co-Training Semi-Supervised Active Learning Algorithm Based on Noise Filter
Chen Yabi ... Zhan Yongzhao
-
Chen Yabi, et. al.Chen Yabi ... Zhan Yongzhao
01 Jan 2009
01 Jan 2009

Trends in overweight, obesity, and waist-to-height ratio among Australian children from linguistically diverse backgrounds, 1997 to 2015
Louise L Hardy ... Ding Ding
International journal of obesity (2005) | VOL. 43
Louise L Hardy, et. al.Louise L Hardy ... Ding Ding
06 Jul 2018
International journal of obesity (2005) | VOL. 43

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semi-supervised learning for text-line detection

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters