Abstract

In order to construct a good machine translation system or make any natural language processing research for cross language information retrieval you must have a good parallel corpus. The Internet archive contains a lot of parallel documents. To construct a good parallel corpus from the Internet archive, you must have a good bilingual dictionary. This paper describes an algorithm to automatically extract an English/Arabic bilingual dictionary from parallel texts that exist in the Internet archive. The system should preferably be useful for many different language pairs. Unlike most of the systems done, our system can extract translation pairs from a very small parallel corpus. This new system can extract translations from only two sentences in one language and two sentences in the other language if the requirements of the system accomplished. Moreover, this system is able to extract word pairs that are translation of each other and the explanation of the Arabic or English word in the other language as well. The accuracy of the system is 59.1% in the case of one English word translated to one Arabic word, 23.9% in the case of one English word translated to more than one Arabic word (Arabic phrase), and 14.6% in the case of one Arabic word translated to more than one English word (English phrase).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call