Sustainable Topic Modeling for Legal Moroccan Arabic Language: A Challenging Study on BERTopic Technique

Soufiane Aouichaty,Yassine Maleh,Mohamed Taib Mohtadi,Abdelmajid Hajami,Hakim Allali

doi:10.1016/j.procs.2024.05.069

Abstract

Topic Modeling approaches face difficulties in processing legal texts because of their unique characteristics, such as the length of the texts and the specialized terminology used within them. The process of topic modeling involves finding a text's semantic structure. This way, specific approaches are needed. When the legal documents are presented has a lot to do with what topics are important. This paper aims to explain and evaluate BERTopic's application to topic modeling in legal documents. In this research, we experiment with BERTopic by utilizing its several pre-trained Arabic language models as embeddings. Performance evaluation employs the Normalized Pointwise Mutual Information (NPMI) measure. Notably, in comparison to multilingual pre-trained models, our findings reveal that BERTopic using Arabic monolingual pre-trained models exhibits superior performance, offering insights into sustainable and efficient topic modeling for legal documents.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sustainable Topic Modeling for Legal Moroccan Arabic Language: A Challenging Study on BERTopic Technique

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science

Lead the way for us

Journal: Procedia Computer Science	Publication Date: Jan 1, 2024
License type: cc-by-nc-nd

Similar Papers

BERT for Arabic Topic Modeling: An Experimental Study on BERTopic Technique
Abeer Abuzayed ... Hend Al-Khalifa
Procedia Computer Science | VOL. 189
Abeer Abuzayed, et. al.Abeer Abuzayed ... Hend Al-Khalifa
01 Jan 2020
Procedia Computer Science | VOL. 189

TiBERT: Tibetan Pre-trained Language Model
Sisi Liu ... Junjie Deng
-
Sisi Liu, et. al.Sisi Liu ... Junjie Deng
09 Oct 2022
09 Oct 2022

Pre-trained Language Models for Tagalog with Multi-source Data
Shengyi Jiang ... Yingwen Fu
-
Shengyi Jiang, et. al.Shengyi Jiang ... Yingwen Fu
01 Jan 2020
01 Jan 2020

Improving sentence representation for vietnamese natural language understanding using optimal transport
Phu Xuan-Vinh Nguyen ... Kiet Van Nguyen
Journal of Intelligent & Fuzzy Systems | VOL. -
Phu Xuan-Vinh Nguyen, et. al.Phu Xuan-Vinh Nguyen ... Kiet Van Nguyen
27 Jun 2023
Journal of Intelligent & Fuzzy Systems | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sustainable Topic Modeling for Legal Moroccan Arabic Language: A Challenging Study on BERTopic Technique

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science