Abstract

Natural Language Processing (NLP) is a branch of artificial intelligence AI that helps computers understand, interpret and manipulate human language. NLP draws from many disciplines, including computer science and computational linguistics, in its pursuit to fill the gap between human communication and computer understanding. In the information age, using NLP for optimizing information search process, text summary, text, and data analysis systems become the most important. So, to achieve accuracy, redundant words without or with low semantic meaning must be filtered. These words are known as stop words. The Stop words list has been developed for languages like Arabic, English, Chinese, French, etc. But Standard Stop Words list is always missing for dialects, as Moroccan dialect. Manual Identification of stop words for the Moroccan dialect is a difficult task, especially with the diversity of ways that can be used to write a simple stop word. In this work, we propose a novel method for Moroccan dialect stop word generation. To attempt this objective, we first realize preprocessing steps to reduce noise, create stop words dictionary to enrich our database for training purposes and finally use word embedding to build stop words clusters. This list is generated from three popular social networks: Facebook, twitter, and YouTube.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call