Advancing AI-Driven Linguistic Analysis: Developing and Annotating Comprehensive Arabic Dialect Corpora for Gulf Countries and Saudi Arabia

Nouf Al-Shenaifi,Manar Hosny,Aqil M Azmi

doi:10.3390/math12193120

Abstract

This study harnesses the linguistic diversity of Arabic dialects to create two expansive corpora from X (formerly Twitter). The Gulf Arabic Corpus (GAC-6) includes around 1.7 million tweets from six Gulf countries—Saudi Arabia, UAE, Qatar, Oman, Kuwait, and Bahrain—capturing a wide range of linguistic variations. The Saudi Dialect Corpus (SDC-5) comprises 790,000 tweets, offering in-depth insights into five major regional dialects of Saudi Arabia: Hijazi, Najdi, Southern, Northern, and Eastern, reflecting the complex linguistic landscape of the region. Both corpora are thoroughly annotated with dialect-specific seed words and geolocation data, achieving high levels of accuracy, as indicated by Cohen’s Kappa scores of 0.78 for GAC-6 and 0.90 for SDC-5. The annotation process leverages AI-driven techniques, including machine learning algorithms for automated dialect recognition and feature extraction, to enhance the granularity and precision of the data. These resources significantly contribute to the field of Arabic dialectology and facilitate the development of AI algorithms for linguistic data analysis, enhancing AI system design and efficiency. The data provided by this research are crucial for advancing AI methodologies, supporting diverse applications in the realm of next-generation AI technologies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Advancing AI-Driven Linguistic Analysis: Developing and Annotating Comprehensive Arabic Dialect Corpora for Gulf Countries and Saudi Arabia

Abstract

Talk to us

Similar Papers

More From: Mathematics

Lead the way for us

Journal: Mathematics	Publication Date: Oct 5, 2024
License type: CC BY 4.0

Similar Papers

Supporting of Postural Deformities Diagnosis Using 3D Scanning
Robert Sitnik ... Wojciech Glinkowski
-
Robert Sitnik, et. al.Robert Sitnik ... Wojciech Glinkowski
29 Jul 2017
29 Jul 2017

Fundamentals of Brain Signals and Its Medical Application Using Data Analysis Techniques
P Geethanjali
-
P GeethanjaliP Geethanjali
01 Jan 2015
01 Jan 2015

A Delaunay diagram‐based Min–Max CP‐Tree algorithm for Spatial Data Analysis
Venkatesan Meenakshi Sundaram ... Arunkumar Thangavelu
WIREs Data Mining and Knowledge Discovery | VOL. 5
Venkatesan Meenakshi Sundaram, et. al.Venkatesan Meenakshi Sundaram ... Arunkumar Thangavelu
23 Apr 2015
WIREs Data Mining and Knowledge Discovery | VOL. 5

A genetic algorithm for pattern recognition analysis of pyrolysis gas chromatographic data
Barry K Lavine ... Lisa K Helfend
Journal of Analytical and Applied Pyrolysis | VOL. 50
Barry K Lavine, et. al.Barry K Lavine ... Lisa K Helfend
26 Mar 1999
Journal of Analytical and Applied Pyrolysis | VOL. 50

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Advancing AI-Driven Linguistic Analysis: Developing and Annotating Comprehensive Arabic Dialect Corpora for Gulf Countries and Saudi Arabia

Abstract

Talk to us

Similar Papers

More From: Mathematics