Topic modeling enables the discovery of concealed themes and patterns in extensive text collections. It facilitates a thorough examination of the messages present in religious texts. Topic modeling for Quranic verses is a trending study area, with various translations already explored including Bahasa, English, and Arabic. Yet, there is a need for further research, particularly in Urdu translations of the Quran. In this study, we propose applying the BERTopic framework to Urdu translations of The Holy Quran. By leveraging the BERTopic approach, which incorporates a fine-tuned BERT model, we aim to capture the contextual nuances and linguistic complexities unique to the Quran. In this study, we utilized existing Urdu translations of the Quran from eight different translators sourced from Tanzil, a renowned resource for Quranic text and translations. We assessed the performance of our proposed BERTopic model compared to traditional techniques like LDA and NMF, using coherence and diversity metrics. The results indicate that our BERT-based approach outperforms these conventional methods, achieving an average coherence improvement of 0.03 and a diversity score of 0.83. These findings highlight the effectiveness of BERTopic in extracting meaningful topics from Urdu translations of The Holy Quran and contribute to the computational analysis of religious texts, supporting scholarly endeavours in comparative studies of Quranic translations in Urdu.
Read full abstract