Abstract

Distributional semantic models are considered one of the empiricist approaches to study language structure and design. Its mainly based on building semantic models of words' meanings using statistical analysis of their distribution in very large corpora. In this paper, we present the Kind Saud University Corpus of Classical Arabic (KSUCCA), which is considered the corner stone for studying the distributional lexical semantic models of the Holy Quran words. It is a free, +50 million words corpus containing texts dating back to the period from pre- Islamic era until the fourth Hijri century. We will describe the design guidelines for KSUCCA including its aim, balance, representation, text sampling, copy right, character encoding and files organization. We will also demonstrate some preliminary experiments we carried out on KSUCCA and the results we got.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.