Abstract

The paper presents a comparative evaluation report on multi-domain Hindi taggers. Two taggers are trained in this experiment with the objective of detecting the accuracy rate of the tagger after adapting Cricket domain. The multi-domain tagger, trained as part of ILCI project, includes our major domain (Health, Tourism, Entertainment and Agriculture) presently and adapting Cricket as a new domain was recently proposed in Pandey (2017) which was calculated with a difference of approx. 6% in the tagger accuracy. Statistically, the accuracy of four domain tagger (without Cricket) is 85% and for five domain tagger (with Cricket) is approx. 93% which is 1% lower than the pre-existing Hindi tagger. This paper deals mainly with evaluation of the Hindi tagger (with and without Cricket as one of the domains). Author also attempts at finding the difference in terms of POS tagging issues in the output and the linguistic analysis of the errors found.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.