The Role of Transliterated Words in Linking Bilingual News Articles in an Archive

Muzammil Khan,Kusum Yadav,Ali Alferaidi,Sarwar Shah Khan,Talal Saad Alharbi,Yasser Alharbi

doi:10.3390/app13074435

Muzammil Khan, Kusum Yadav + Show 4 more

Open Access

https://doi.org/10.3390/app13074435

Copy DOI

Journal: Applied Sciences	Publication Date: Mar 31, 2023
Citations: 1	License type: CC BY 4.0

Affiliation: University of Swat, University of Ha'il

Abstract

Retrieving a specific digital information object from a multi-lingual huge and evolving news archives is challenging and complicated against a user query. The processing becomes more difficult to understand and analyze when low-resourced and morphologically complex languages like Urdu and Arabic scripts are included in the archive. Computing similarity against a query and among news articles in huge and evolving collections may be inaccurate and time-consuming at run time. This paper introduces a Similarity Measure based on Transliteration Words (SMTW) from the English language in the Urdu scripts for linking news articles extracted from multiple online sources during the preservation process. The SMTW link Urdu-to-English news articles using an upgraded Urdu-to-English lexicon, including transliteration words. The SMTW was exhaustively evaluated to assess the effectiveness using different size datasets and the results were compared with the Common Ratio Measure for Dual Language (CRMDL). The experimental results show that the SMTW was more effective than the CRMDL for linking Urdu-to-English news articles. The precision improved from 50% to 60%, recall improved from 67% to 82%, and the impact of common terms also improved.

Full Text