A proposed semantic keywords search engine for Indonesian Qur’an translation based on word embedding

Liza Trisnawati,Zul Indra,Ezak Fadzrin Bin Ahmad Shaubari,Sukri Sukri,Noor Azah Binti Samsudin,Shamsul Kamal Bin Ahmad Khalid

doi:10.11591/ijeecs.v35.i2.pp987-995

Abstract

Obtaining relevant information from the Holy Qur’an can be really challenging for people who cannot speak Arabic, such as the Indonesian people. One technology implementation which is commonly used to tackle this problem is to develop a search engine application for Al-Qur’an verses. This paper proposes a search engine based on semantic representation keywords for the Indonesian translation of the Al-Qur’an which consists of 3 phases i.e., data preparation, document representation, and search engine development. In the first stage, the Al-Qur’an dataset was built using the official translation of the Al-Qur’an from the Ministry of Religion and then enriched with the Wikipedia corpus. The second phase is document representation which produces feature vectors by utilizing the Word2Vec algorithm. Finally, the development of a search engine that can find the most relevant verses by calculating the cosine similarity between the document and the keywords. It was found that the proposed search engine succeeded in exceeding the performance of ordinary search engines by finding wider information due to the use of semantic keywords. Apart from that, the proposed search engine succeeded in maintaining the relevance of search results by achieving precision and recall levels of 98.7% and 97.3% respectively.

Full Text