Abstract

Text segmentation is a live research field with vast new areas to be explored. Separating text layer from graphics is a fundamental step to exploit text and graphics information. The language used in the map is a challenging issue in text layer separation problem. All current methods are proposed for non-Persian language maps. In Persian, text strings are composed of one or more subwords. Each subword is also composed of one to several letters connected together. Therefore, the components of the text strings in Persian are more diverse in terms of size and geometric form than in English. Thus, the overlapping of the Persian text and the lines usually produces a complex structure that the existing methods cannot handle with the necessary efficiency. For this purpose, the stroke width variety of the input map is calculated, and then the average line width of graphics is estimated by analyzing the content of stroke width. After finding the average width of graphical lines, we classify the complex structure into text and graphics in pixel level. We evaluate our method on some variety of full crossing text and graphics in Persian maps and show that some promising results in terms of precision and recall (above 80% and 90%, respectively) are obtained.

Highlights

  • Text extraction is a fundamental task in graphical document image analysis

  • Despite the many studies that have been reported on the text layer extraction from the map, there does not exist any study on the effect of Persian language in the map processing research area [15]

  • For our experiments, we gathered 5 real Persian city map images scanned at 300 dpi from sources like major map publishers, Sahab Geographic and Drafting Institute [19] and National Cartographic Center (NCC) [20]

Read more

Summary

Introduction

Text extraction is a fundamental task in graphical document image analysis. This problem frequently occurs in many applications like the map, form processing and engineering drawing interpretation where text and graphics are processed in mainly different ways [1]-[20]. Current OCR systems cannot recognize text labels in complex mixed text and graphics Both government and business organizations must frequently convert existing paper maps of raster maps into a machinereadable form that can be interfaced with the current geographical information systems (GIS) or optical character recognition (OCR). Despite the many studies that have been reported on the text layer extraction from the map, there does not exist any study on the effect of Persian language in the map processing research area [15].

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.