Abstract

Distributional semantic models are considered one of the empiricist approaches to study language structure and design. Its mainly based on building semantic models of words' meanings using statistical analysis of their distribution in very large corpora. In this paper, we present the Kind Saud University Corpus of Classical Arabic (KSUCCA), which is considered the corner stone for studying the distributional lexical semantic models of the Holy Quran words. It is a free, +50 million words corpus containing texts dating back to the period from pre- Islamic era until the fourth Hijri century. We will describe the design guidelines for KSUCCA including its aim, balance, representation, text sampling, copy right, character encoding and files organization. We will also demonstrate some preliminary experiments we carried out on KSUCCA and the results we got.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call