Language Identification in Document Images

P Barlas,D Hebert,C Chatelain,S Adam,T Paquet

doi:10.2352/issn.2470-1173.2016.17.drr-058

Abstract

This paper presents a system dedicated to automatic language identification of text regions in heterogeneous and complex documents. This system is able to process documents with mixed printed and handwritten text and various layouts. To handle such a problem, we propose a system that performs the following sub-tasks: writing type identification (printed/handwritten), script identification and language identification. The methods for the writing type recognition and the script discrimination are based on the analysis of the connected components while the language identification approach relies on a statistical text analysis , which requires a recognition engine. We evaluate the system on a new public dataset and present detailed results on the three tasks. Our system outperforms the Google plug-in evaluated on the ground-truth transcriptions of the same dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronic Imaging	Publication Date: Feb 17, 2016
Citations: 2	License type: other-oa

R Discovery Prime

R Discovery Prime

Language Identification in Document Images

Abstract

Talk to us

Similar Papers

More From: Electronic Imaging

Lead the way for us

Similar Papers

Language Identification in Document Images
P Barlas ... C Chatelain
Journal of Imaging Science and Technology | VOL. 60
P Barlas, et. al.P Barlas ... C Chatelain
01 Jan 2015
Journal of Imaging Science and Technology | VOL. 60

Writing type, script and language identification in heterogeneous documents
Anis Mezghani ... Fouad Slimane
International Journal of Intelligent Systems Technologies and Applications | VOL. 16
Anis Mezghani, et. al.Anis Mezghani ... Fouad Slimane
01 Jan 2017
International Journal of Intelligent Systems Technologies and Applications | VOL. 16

Unsupervised Deep Language and Dialect Identification for Short Texts
Koustava Goswami ... Theodorus Fransen
-
Koustava Goswami, et. al.Koustava Goswami ... Theodorus Fransen
01 Jan 2020
01 Jan 2020

On Hierarchical Text Language-Identification Algorithms
Maimaitiyiming Hasimu ... Wushour Silamu
Algorithms | VOL. 11
Maimaitiyiming Hasimu, et. al.Maimaitiyiming Hasimu ... Wushour Silamu
27 Mar 2018
Algorithms | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Language Identification in Document Images

Abstract

Talk to us

Similar Papers

More From: Electronic Imaging