Abstract
The recent years have witnessed the development of different computational approaches to the study of linguistic variations and regional dialectology in different languages including English, German, Spanish and Chinese. These approaches have proved effective in dealing with large corpora and making reliable generalizations about the data. In Arabic, however, much of the work on regional dialectology is so far based on traditional methods; therefore, it is difficult to provide a comprehensive mapping of the dialectal variations of all the colloquial dialects of Arabic. As thus, this study is concerned with proposing a computational statistical model for mapping the linguistic variation and regional dialectology in Colloquial Arabic through Twitter based on the lexical choices of speakers. The aim is to explore the lexical patterns for generating regional dialect maps as derived from Twitter users. The study is based on a corpus of 1597348 geolocated Twitter posts. Using principal component analysis (PCA), data were classified into distinct classes and the lexical features of each class were identified. Results indicate that lexical choices of Twitter users can be usefully used for mapping the regional dialect variation in Colloquial Arabic.
Highlights
Sociolinguists have studied lexical variation and correlated the process through which speaker groups choose their vocabulary with a bundle of variables, such as gender, context, social status, topic [1,2,3,4]
It is true that these communication channels and networks provide good opportunities for researchers and sociolinguists to study and explore linguistic variation among different speaker groups
The study of linguistic variation through social media networks has been parallel to computational methods
Summary
Sociolinguists have studied lexical variation and correlated the process through which speaker groups choose their vocabulary with a bundle of variables, such as gender, context, social status, topic [1,2,3,4]. It is true that these communication channels and networks provide good opportunities for researchers and sociolinguists to study and explore linguistic variation among different speaker groups. The study of linguistic variation through social media networks has been parallel to computational methods. This study is concerned with proposing a computational model for mapping the linguistic variation and regional dialectology in Colloquial Arabic through Twitter based on the lexical choices of speakers. In order to map the linguistic variation of Colloquial Arabic dialects, cluster analysis methods were used. This is a clustering method where each class or group has distinct features that make it different from other groups.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Advanced Computer Science and Applications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.