Abstract

Preprocessing the input text is an essential component in a Natural Language Processing (NLP) system. We are discussing the relevance of the preprocessors in the context of Machine Translation system developed by us based on AnglaBharati Technology. Whenever we come across with text for translation we encounter with the special formats in an input text and getting its appropriate translation is a difficult task. Sometimes they may not have definite grammatical structure and may not be able to handle using a language rule. This paper present a strategy to identify the special formats in English text like date, currency, number, time, quotes, acronym, parenthesis, etc for a rule based English Malayalam Machine Aided Translation system. AnglaBharati is a pattern directed rule based system with context free grammar like structure for English which generates a pseudo target for group of Indian languages. Preprocessor is one of the main modules in this translation System. Here it manipulates the English input text to produce an input which is more suitable for an engine to generate appropriate translation. Extensive research is carried out in this area to disambiguate and process the input text in order to get more suitable output from the translation engine.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call