Abstract

In order to accelerate the performance of various Natural Language Processing tasks for Roman Urdu, this article for the very first time provides 3 neural word embeddings prepared using most widely used approaches namely Word2vec, FastText, and Glove. The integrity of generated neural word embeddings is evaluated using intrinsic and extrinsic evaluation approaches. Considering the lack of publicly available benchmark datasets, it provides a first-ever Roman Urdu public dataset which consists of 3241 sentiments annotated against positive, negative, and neutral classes. To provide benchmark baseline performance over the presented dataset for Roman Urdu sentiment analysis, we adapt diverse machine learning (Support Vector Machine, Logistic Regression, Naive Bayes), deep learning (convolutional neural network, recurrent neural network), and hybrid deep learning approaches. Performance impact of generated neural word embeddings based representation is compared with other most widely used bag of words based feature representation approaches using diverse machine and deep learning classifiers. In order to improve the performance of Roman Urdu sentiment analysis, it proposes a novel precisely extreme multi-channel hybrid methodology which makes use of convolutional and recurrent neural networks along with pre-trained neural word embeddings. The proposed hybrid approach outperforms adapted machine learning approaches by the significant figure of 9% and deep learning approaches by the figure of 4% in terms of F1-score.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call