Machine Translation Utilizing the Frequent-Item Set Concept.

Hanan A Hosni Mahmoud,Hanan Abdullah Mengash

doi:10.3390/s21041493

Hanan A Hosni Mahmoud, Hanan Abdullah Mengash

Open Access

https://doi.org/10.3390/s21041493

Copy DOI

Abstract

In this paper, we introduce new concepts in the machine translation paradigm. We treat the corpus as a database of frequent word sets. A translation request triggers association rules joining phrases present in the source language, and phrases present in the target language. It has to be noted that a sequential scan of the corpus for such phrases will increase the response time in an unexpected manner. We introduce the pre-processing of the bilingual corpus through proposing a data structure called Corpus-Trie (CT) that renders a bilingual parallel corpus in a compact data structure representing frequent data items sets. We also present algorithms which utilize the CT to respond to translation requests and explore novel techniques in exhaustive experiments. Experiments were performed on specific language pairs, although the proposed method is not restricted to any specific language. Moreover, the proposed Corpus-Trie can be extended from bilingual corpora to accommodate multi-language corpora. Experiments indicated that the response time of a translation request is logarithmic to the count of unrepeated phrases in the original bilingual corpus (and thus, the Corpus-Trie size). In practical situations, 5–20% of the log of the number of the nodes have to be visited. The experimental results indicate that the BLEU score for the proposed CT system increases with the size of the number of phrases in the CT, for both English-Arabic and English-French translations. The proposed CT system was demonstrated to be better than both Omega-T and Apertium in quality of translation from a corpus size exceeding 1,600,000 phrases for English-Arabic translation, and 300,000 phrases for English-French translation.

Highlights

Accepted: 17 February 2021Machine Translation (MT) is an automated procedure of bilingual or multi-lingual translation [1]
PhS and PhTstrated are twothat phrases from source language and target language, respectively; PhThave is O
N is noted as the count of phrases of the source language; this is because the values of the nodes in the trie are sorted in each horizontal level

Summary

Introduction

Accepted: 17 February 2021Machine Translation (MT) is an automated procedure of bilingual or multi-lingual translation [1]. Statistical machine translation (SMT) and Neural Machine. Standard SMT techniques do not depend on any linguistic information, and do not apply any pre-processing procedures to generate the translation [4,5]. SMT systems require huge text corpora to extract linguistic rules based on entropy [3,4,6]. SMT utilizes high-volume parallel corpora between source and destination languages, which are to a large extent available [7]. All pure SMT systems derive data from corpora that they have previously analyzed, and do not rely on linguistic information. SMT relies on the use of statistics to solve the alignment problem and the induction of grammatical units [12,13,14,15,16]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors (Basel, Switzerland)	Publication Date: Feb 21, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Machine Translation Utilizing the Frequent-Item Set Concept.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

PENERJEMAH, PENERJEMAHAN, TERJEMAHAN, DAN DINAMIKA BUDAYA: MENATAP PERAN PENERJEMAHAN PADA MASA LALU DI NUSANTARA

-

16 Jan 2017
16 Jan 2017

Machine translation of standardised medical terminology using natural language processing: A scoping review
Richard Noll ... Jannik Schaaf
New Biotechnology | VOL. 77
Richard Noll, et. al.Richard Noll ... Jannik Schaaf
29 Aug 2023
New Biotechnology | VOL. 77

Building task-oriented machine translation systems
Germán Sanchis Trilles
-
Germán Sanchis TrillesGermán Sanchis Trilles
03 Oct 2012
03 Oct 2012

Translation Strategies and Quality of Metaphor in “Twilight” Novel By Stephanie Meyer
Abdurrahman Faridi ... Hajar Mutiara Ningtyas
English Education Journal | VOL. 11
Abdurrahman Faridi, et. al.Abdurrahman Faridi ... Hajar Mutiara Ningtyas
23 Dec 2021
English Education Journal | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine Translation Utilizing the Frequent-Item Set Concept.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)