Abstract

Sentiment lexicon is a very important resource for opinion mining. Recently, many state-of-the-art works employ deep learning techniques to construct sentiment lexicons. In general, they firstly learn sentiment-aware word embeddings, and then use it as word features to construct sentiment lexicons. However, these methods do not consider the importance of each word to the distinguish of documents’ sentiment polarities. As we know, most words among a document do not contribute to understand documents’ semantic or sentiment. For example, in the tweet It's a good day, but i can't feel it. I'm really unhappy. The words ‘unhappy’, ‘feel’ and ‘can't’ are much more important than the words ‘good’, ‘day’ in predicting the sentiment polarity of this twitter. Meanwhile, many words, such as ‘the’, ‘in’, ‘it’ and ‘I'm’ are uninformative. In this paper, we propose a novel sparse self-attention LSTM SSALSTM to efficiently capture the above intuitive facts, and then construct a large scale sentiment lexicons in twitter. In SSALSTM, we use a novel self-attention mechanism to capture the importance of each words to the distinguish of documents’ sentiment polarities. In addition, a $L_1$ regularize is applied in the attentions which can ensure the sparsity characters that most words in a document are semantic and sentiment indistinguishable. Once we learn an efficient sentiment-aware word embedding, we train a classifier which uses sentiment-aware word embedding as features to predict the sentiment polarities of words. Extensive experiments on four publicly available datasets, SemEval 2013–2016, indicate that the sentiment lexicon generated by our proposed model achieves state-of-the-art performance on both supervised and unsupervised sentiment classification tasks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.