Abstract

Objectives: To improve the efficiency of tri-level segmentation tasks for handwritten Gujarati text. Methods: Using hybrid methods for tri-level segmentation, we have used line, word and character segmentation from the image. This study presents a segmentation paradigm that works with touching characters, slop of the line written on the page, character overlapping, etc. It evaluated on the dataset of 500+ images created by us on different writing sentences by different people. We have used the Horizontal projection technique for line segmentation, Scale-space technique for word segmentation and the Vertical projection technique for character segmentation. Findings: The experimental results show that the proposed method is more efficient for handwritten Gujarati text with diacritics. We have obtained the accuracy for character level segmentation is 82%, word-level is 90% and for the line-level segmentation is 87%. Novelty: We have designed a methodology to segment Gujarati handwritten text with diacritics at all three levels including characters, words and lines. Applications: We have proposed tri-level segmentation which is pre-processing task that can be used in any character recognition systems i.e. OCR. Keywords: Deep learning; trilevel segmentation; handwritten Gujarati text

Highlights

  • In the recent era of computer digital evaluation, Natural Language Processing (NLP) is getting more obligatory in our day-to-day life

  • Many authors have a focus on the different segmentation methods but due to the difficulty of the writing style, they are not enough able to get 100% line, word, and character segmentation accuracy

  • Each character in Gujarati has a special appearance and Gujarati handwritten script is irregular in style due to many connected characters, overlapping words, slop of the line, etc. with the above methods we used projection methods for line and character segmentation while we have used scale space method for word segmentation

Read more

Summary

Introduction

In the recent era of computer digital evaluation, Natural Language Processing (NLP) is getting more obligatory in our day-to-day life. To educate and enhance the scope of technology we have to reach a root level of the population. It requires thoughtful hard work for NLP. Gujarati is the 7th most spoken language in India. Gujarat government and local persons are used Gujarati as their communication medium either verbal or written. Many studies that focused on online and word segmentation have not deeply focus on word segmentation. Many authors have a focus on the different segmentation methods but due to the difficulty of the writing style, they are not enough able to get 100% line, word, and character segmentation accuracy

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call