Abstract

Word embedding has opened new and exciting avenues for understanding and processing languages. The simple yet effective word embedding models rapidly became a dominant building block for Natural Language Processing (NLP) applications as they impressively encode linguistic similarities and syntactic regularities between words. However, ignoring the morphological structure of words degrades its performance when applied to languages with complex morphology like Arabic. In this paper, we investigate enhancing Arabic word embedding by incorporating morphological annotations to the embedding model. We further tune the generated word vectors to their lemma forms using linear compositionality to generate lemma-based embedding. To assess the effectiveness of our model, we perform evaluation using Arabic analogy, sentiment and subjectivity analysis. Our results show improvements over existing state-of-the-art methods for Arabic word embedding.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call