Abstract

Twitter is the most popular microblogging platform, with millions of users exchanging daily a huge volume of text messages, called “tweets”. This has resulted in an enormous source of unstructured data, Big Data. Such a Big Data can be analyzed by companies or organizations with the purpose of extracting customer perspective about their products or services and monitoring marketing trends. Understanding automatically the opinions behind user-generated content, called “Big Data Analytics”, is of great concern. Deep learning can be used to make discriminative tasks of Big Data Analytics easier and with higher performance. Deep learning is an aspect of machine learning which refers to an artificial neural network with multiple layers and has been extensively used to address Big Data challenges, like semantic indexing, data tagging and immediate information retrieval. Deep learning requires its input to be represented as word embeddings, i.e. as a real-value vector in a high-dimensional space. However, word embedding models need large corpuses for training and presenting a reliable word vector. Thus, there are a number of pre-trained word embeddings freely available to leverage. In effect, these are words and their corresponding n-dimensional word vectors, made by different research teams. In this work, we have made data analysis with huge numbers of tweets taken as big data and thereby classifying their polarity using a deep learning approach with four notable pre-trained word vectors, namely Google’s Word2Vec, Stanford’s Crawl GloVe, Stanford’s Twitter GloVe, and Facebook’s FastText. One major conclusion is that tweet classification using deep learning outperforms the baseline machine learning algorithms. At the same time and with regard to pre-trained word embeddings, FastText provides more consistent results across datasets, while Twitter GloVe obtains very good accuracy rates despite its lower dimensionality.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.