Comparison of Various Word Embeddings for Hate-Speech Detection

Minni Jain,Puneet Goel,Puneet Singla,Rahul Tehlan

doi:10.1007/978-981-15-8335-3_21

Abstract

AbstractWord Embedding plays a crucial role in natural language processing, and other related domains. The vast variety of language modelling and feature learning techniques often concludes in a quandary. The motivation behind this work was to produce comparative analysis among these methods and finally use them to flag hate-speech on social media. The progress in these word embedding techniques has led to remarkable results by incorporating various natural language applications. Understanding the different context of polysemous words is one of the features that evolved over time with these word embedding models. A systematic review on varying word embedding methodologies has been performed in this paper. Various experimental metrics have been used and detailed analysis has been done on each word embedding model. It is shown that analysis involves various aspects of the model like dealing with multi-sense words, and rarely occurring words, etc., and finally a coherent analysis report is presented. The various models under analysis are—Word2Vec (Skip-Gram, CBOW), GloVe, Fast-Text and ELMo. These models are then put to a real-life application in the form of Hate Speech detection of twitter data, and their individual capacities and accuracies are compared. Through this paper we show how ELMo uses different word embeddings for polysemous words to capture the context. We show how Hate speech can be better detected by ELMo because such speech requires better understanding of context of words for segregation from normal speech/text.KeywordsWord2VecSkip-gramCBOWGloVeFast-textElmoWord embeddingHate-speech detection

Full Text