A New Approach to Tagging in Indian Languages

Kavi Narayana Murthy,Srinivasu Badugu

doi:10.13053/rcs-70-1-4

Abstract

In this paper, we present a new approach to automatic tag- ging without requiring any machine learning algorithm or training data. We argue that the critical information required for tagging comes more from word internal structure than from the context and we show how a well designed morphological analyzer can assign correct tags and disam- biguate many cases of tag ambiguities too. The crux of the approach is in the very denition of words. While others simply tokenize a given sen- tence based on spaces and take these tokens to be words, we argue that words need to be motivated from semantic and syntactic considerations, not orthographic conventions. We have worked on Telugu and Kannada languages and in this paper, we take the example of Telugu language and show how high quality tagging can be achieved with a ne grained, hierarchical tag set, carrying not only morpho-syntactic information but also some aspects of lexical and semantic information that is necessary or useful for syntactic parsing. In fact entire corpora can be tagged very fast and with a good degree of guarantee of quality. We give details of our experiments and results obtained. We believe our approach can also be applied to other languages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A New Approach to Tagging in Indian Languages

Abstract

Talk to us

Similar Papers

More From: Research in Computing Science

Lead the way for us

Journal: Research in Computing Science	Publication Date: Dec 31, 2013
Citations: 14

Similar Papers

Comparison Of Different Feature Extraction Techniques In Telugu Dialects Identification
S Shivaprasad, Et Al
Turkish Journal of Computer and Mathematics Education (TURCOMAT) | VOL. 12
S Shivaprasad, Et AlS Shivaprasad, Et Al
07 May 2021
Turkish Journal of Computer and Mathematics Education (TURCOMAT) | VOL. 12

Issues in Indian languages computing in particular reference to search and retrieval in Telugu language
Devika P Madalli ... Dimple Patel
Library Hi Tech | VOL. 27
Devika P Madalli, et. al.Devika P Madalli ... Dimple Patel
04 Sep 2009
Library Hi Tech | VOL. 27

Compact Data Learning for Machine Learning Classifications
Song-Kyoo (Amang) Kim
Axioms | VOL. 13
Song-Kyoo (Amang) KimSong-Kyoo (Amang) Kim
21 Feb 2024
Axioms | VOL. 13

Durational Characteristics of Indian Phonemes for Language Discrimination
B Lakshmi Kanth ... Venkatesh Keri
-
B Lakshmi Kanth, et. al.B Lakshmi Kanth ... Venkatesh Keri
01 Jan 2010
01 Jan 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A New Approach to Tagging in Indian Languages

Abstract

Talk to us

Similar Papers

More From: Research in Computing Science