Abstract

Automatic Parts of Speech (PoS) Tagging is a sequence labeling problem. PoS Tagging research has undergone an evolutionary journey starting with Dictionary Lookup PoS Tagger model, and then using rule based and statistical schemes, and later on adopting hybrid methodology for enhanced performance. Emergence of Machine Learning (ML) has boosted the activities adding newer dimensions to look at the problem with deeper linguistics and computational perspectives, and in recent times this has shifted to completely self-learning models with incorporation of Deep Learning (DL) tools. Here we have recorded and analyzed this trajectory for the PoS tagger development and experimentation for the Indo Aryan languages. Rule based and statistical models performed to an acceptable level, but are not robust and dynamic. ML and DL based models outperformed all other models, and started giving higher performances, with reported accuracy to the tune of upto 97% in few cases. Various customized models using DL have been experimented in very recent days, and different groups have reported best performed models using a variety of combination of pre-processing methods to that with DL tools, substantiating with quantitative performance matrix reports. Structured reports, inclusive methodologies adopted, and quantitative performance evaluation comparisons are elaborated in the paper. This comprehensive and critical review shall act as a foundational backbone for any Indo Aryan language PoS Tagger modeling experiment, either fresh, or attempt to enhance performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call