A crucial component of Cross Lingual Information Retrieval (CLIR) is Neural Machine Translation (NMT). NMT performs a good job of transforming queries in the English language into Indian languages. This study focuses on the translation of English queries into Telugu. For translations, the NMT will make use of a parallel corpus. Due to a lack of resources in the Telugu language, it is exceedingly challenging to provide NMT with sizable parallel corpora. Thus, the NMT will encounter an issue known as Out of Vocabulary (OOV). Long Short-Term Memory (LSTM) with Byte Pair Encoding (BPE), which breaks up rare words into subwords and attempts to translate them to solve the OOV issue. Issues such as Named Entity Recognition (NER) continue to plague it. In sequence-to-sequence models, bidirectional LSTMs can solve certain NER challenges. Systems that need to be trained in both directions to recognize named entities can benefit from the use of Bidirectional LSTMs (BiLSTMs). The translation efficiency of NMT with BiLSTMs is significantly higher than normal LSTMs, as indicated by the accuracy metrics and Bilingual Evaluation Understudy (BLEU) score.
Read full abstract