Abstract

Developing an automatic named-entity recognition system accompanied by a translation system has become an important task in Natural Language Processing applications. In this context, we are interested in building a named-entity recognition system for Tunisian dialect by providing their translation into modern standard Arabic. In fact, Tunisian dialect is a variant of Arabic, as much as it differs from modern standard Arabic. Still, it is difficult to understand for non-Tunisian Arabic speakers. To develop our system, we studied many Tunisian dialect corpora to identify and look into various structures for different named entity types. The proposed method is based on a bilingual dictionary extracted from the study corpus and an elaborated set of local grammars. In addition, local grammars were transformed into finite-state transducers using recent technologies of the NooJ linguistic platform. To test and evaluate the designed system, we applied it to a Tunisian dialect test corpus containing around 20,000 words. The obtained results are ambitious.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call