Abstract
Twitter has gained wide attention as a major social media platform where many topics are discussed on daily basis through millions of tweets. A tweet can be viewed as a speech act (SA), which is an utterance for presenting information, hiding indirect meaning, or carrying out an action. According to SA theory, SA can represent an assertion, a question, a recommendation, or many other things. In this paper, we tackle the problem of constructing a reference corpus of Arabic tweets for the classification of Arabic speech acts. We refer to this corpus as the Arabic Tweets Speech Act Corpus (ArTSAC). It is an enhancement of a modern standard Arabic (MSA) tweet corpus of speech acts called ArSAS. ArTSAC is more advantageous than ArSAS in terms of its richness of annotated features. The goal of ArTSAC is twofold: Firstly, to understand the purpose and intention of tweets which act in accordance with the SA theory, and hence positively influencing the development of many natural language processing (NLP) applications. Secondly, as a future goal, to be used as a benchmark annotated dataset for testing and evaluating state-of-the-art Arabic SA classification algorithms and applications. ArTSAC has been put in practice to classify Arabic tweets containing speech acts using the Support Vector Machine (SVM) classification algorithm. The results of the experiments show that the enhanced ArTSAC corpus achieved an average precision of 90.6% and an F-score of 89.6%. Substantially it outperformed the results of its predecessor ArTSAC corpus.
Highlights
People discuss different issues and topics on twitter throughout their tweets
Before we discuss the results we obtained from our modified Arabic Tweets Speech Act Corpus (ArTSAC) corpus, we start with highlighting the previous results obtained from the Arabic SA and Sentiment corpus (ArSAS) corpus [33] we compare the results from running Support Vector Machine (SVM) on our modified ArTSAC corpus and compare it with the ArSAS corpus
We presented the development and construction of a richly annotated reference corpus of Arabic tweets for speech act classifications
Summary
People discuss different issues and topics on twitter throughout their tweets. Recently, twitter has gained great attention and attraction from the popular press and, increasingly, from scholars. Due to the tremendous volume of tweets, the problem of classifying and extracting useful information out of them is a sort of managing big data. 3) NLP tasks such as sentiment analysis [10], rumor detection [11], and evaluation of customer satisfaction are important in many online applications today; especially in big data environments where the need for automated tools is urgent. We tackle the problem of creating a reference corpus of Arabic tweets for the classification of Arabic speech acts. The goal of ArTSAC is twofold: Firstly, to understand the purpose and intention of people’s tweets which comply with the SA theory, and positively influencing the development of many Arabic NLP applications.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Advanced Computer Science and Applications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.