Efficient Use of Resources for Statistical Machine Translation

Karunesh Kumar Arora,Shyam Sunder Agrawal

doi:10.14429/djlit.37.11420

Abstract

<div class="page" title="Page 1"><div class="layoutArea"><div class="column"><p><span>Machine translation has great potential to expand the audience for ever increasing digital collections. Success of data driven machine translation systems is governed by the volume of parallel data on which these systems are being modelled. The languages which do not have such resources in huge quantity, the optimum utilisation of them can only be assured through their quality. Morphologically rich language like Hindi poses further challenge, due to </span><span>having more number of orthographic inflections for a given word and presence of non-standard word spellings in </span><span>the corpus. This increases the chances of getting more number of words which are unseen in the training corpus. In this paper, the objective is to reduce redundancy of available corpus and utilise the other resources as well, to make best use of resources. Reduction in number of words unseen to the translation model is achieved through text noise removal, spell normalisation and utilising English WordNet (EWN). The test case presented here is for English-Hindi language pair. The results achieved are promising and set example for other morphological rich languages to optimise the resources to improve the performance of the translation system. </span></p></div></div></div>

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: DESIDOC Journal of Library & Information Technology	Publication Date: Oct 23, 2017
Citations: 1	License type: CC BY-NC-ND 2.5 IN

R Discovery Prime

R Discovery Prime

Efficient Use of Resources for Statistical Machine Translation

Abstract

Talk to us

Similar Papers

More From: DESIDOC Journal of Library & Information Technology

Lead the way for us

Similar Papers

Efficient Use of Resources for Statistical Machine Translation
Karunesh Kumar Arora ... Shyam Sunder Agrawal
DESIDOC Journal of Library & Information Technology | VOL. 37
Karunesh Kumar Arora, et. al.Karunesh Kumar Arora ... Shyam Sunder Agrawal
23 Oct 2017
DESIDOC Journal of Library & Information Technology | VOL. 37

Influence of Various Parameters on the Performances of Catalysis for Two-Stroke Engines
Franck Castagna ... Marc Dubus
-
Franck Castagna, et. al.Franck Castagna ... Marc Dubus
27 Oct 1997
27 Oct 1997

An Evaluation of the Accuracy of Online Translation Systems
Milam Aiken ... Kaushik Ghosh
Communications of the IIMA | VOL. 9
Milam Aiken, et. al.Milam Aiken ... Kaushik Ghosh
03 Jun 2014
Communications of the IIMA | VOL. 9

Baidu Translate: Research and Products
Zhongjun He
-
Zhongjun HeZhongjun He
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Use of Resources for Statistical Machine Translation

Abstract

Talk to us

Similar Papers

More From: DESIDOC Journal of Library &amp; Information Technology

More From: DESIDOC Journal of Library & Information Technology