Exploring and exploiting a historical corpus for Arabic

Bassam Hammo,Omaima Ismail,Mohammad Abushariah,Sane Yagi

doi:10.1007/s10579-015-9304-9

Abstract

This paper presents a historical Arabic corpus named HAC. At this early embryonic stage of the project, we report about the design, the architecture and some of the experiments which we have conducted on HAC. The corpus, and accordingly the search results, will be represented using a primary XML exchange format. This will serve as an intermediate exchange tool within the project and will allow the user to process the results offline using some external tools. HAC is made up of Classical Arabic texts that cover 1600 years of language use; the Quranic text, Modern Standard Arabic texts, as well as a variety of monolingual Arabic dictionaries. The development of this historical corpus assists linguists and Arabic language learners to effectively explore, understand, and discover interesting knowledge hidden in millions of instances of language use. We used techniques from the field of natural language processing to process the data and a graph-based representation for the corpus. We provided researchers with an export facility to render further linguistic analysis possible.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploring and exploiting a historical corpus for Arabic

Abstract

Talk to us

Similar Papers

More From: Language Resources and Evaluation

Lead the way for us

Journal: Language Resources and Evaluation	Publication Date: May 30, 2015
Citations: 17

Similar Papers

Historical Languages, Corpora, and Computational Methods
Barbara Mcgillivray
-
Barbara McgillivrayBarbara Mcgillivray
01 Jan 2014
01 Jan 2014

Corpus Linguistic Tools for Historical Semantics in Arabic
Omaima Ismail
International Journal of Arabic-English Studies | VOL. 15
Omaima IsmailOmaima Ismail
01 Jan 2014
International Journal of Arabic-English Studies | VOL. 15

Arabic Word Segmentation With Long Short-Term Memory Neural Networks and Word Embedding
Abdulrahman Almuhareb ... Waleed Alsanie
IEEE Access | VOL. 7
Abdulrahman Almuhareb, et. al.Abdulrahman Almuhareb ... Waleed Alsanie
01 Jan 2019
IEEE Access | VOL. 7

Digital Text Authentication Using Deep Learning: Proposition for the Digital Quranic Text
Zineb Touati-Hamad ... Issam Bendib
-
Zineb Touati-Hamad, et. al.Zineb Touati-Hamad ... Issam Bendib
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring and exploiting a historical corpus for Arabic

Abstract

Talk to us

Similar Papers

More From: Language Resources and Evaluation