Abstract

The Neural Machine Translation (NMT) is very crucial for Cross-Lingual Information Retrieval (CLIR). NMT is effective in translating English language queries to the Telugu Language. In this paper, we are translating English queries to Telugu. The NMT will utilize a parallel corpus for translations. Telugu is a resource-poor language, it is very difficult to supply large amounts of parallel corpus to NMT. So the NMT will have a problem called Out Of Vocabulary (OOV). To overcome this problem Byte Pair Encoding (BPE) is used along with Long Short Term Memory (LSTM), which segments the rare words into sub-words and tries to translate the rare words. It still faces problems like Named Entity Recognition (NER). Some problems of NER can be solved by utilizing bidirectional LSTMs in sequence-to-sequence models. The bidirectional LSTMs (BiLSTMs) will be helpful in training systems in both directions for recognizing the named entities. The accuracy parameters and a BLEU score show the translation quality of NMT with bidirectional LSTMs has slightly more accuracy than regular LSTMs which is considerable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call