Abstract
Topic Modeling approaches face difficulties in processing legal texts because of their unique characteristics, such as the length of the texts and the specialized terminology used within them. The process of topic modeling involves finding a text's semantic structure. This way, specific approaches are needed. When the legal documents are presented has a lot to do with what topics are important. This paper aims to explain and evaluate BERTopic's application to topic modeling in legal documents. In this research, we experiment with BERTopic by utilizing its several pre-trained Arabic language models as embeddings. Performance evaluation employs the Normalized Pointwise Mutual Information (NPMI) measure. Notably, in comparison to multilingual pre-trained models, our findings reveal that BERTopic using Arabic monolingual pre-trained models exhibits superior performance, offering insights into sustainable and efficient topic modeling for legal documents.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.