Abstract
This article presents an elegant technique for extracting the low-level stroke features, such as endpoints, junction points, line elements, and curve elements, from offline printed text using a template matching approach. The proposed features are used to classify a subset of characters from Gujarati script. The database consists of approximately 16,782 samples of 42 middle-zone symbols from the Gujarati character set collected from three different sources: machine printed books, newspapers, and laser printed documents. The purpose of this division is to add variety in terms of size, font type, style, ink variation, and boundary deformation. The experiments are performed on the database using a k-nearest neighbor (kNN) classifier and results are compared with other widely used structural features, namely Chain Codes (CC), Directional Element Features (DEF), and Histogram of Oriented Gradients (HoG). The results show that the features are quite robust against the variations and give comparable performance with other existing works.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: ACM Transactions on Asian and Low-Resource Language Information Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.