Abstract

New forms of written communication (electronic mail, forum, chat, SMS, etc.) are new challenges for Natural Language Processing methods. These data present very particular linguistic phenomena: too short and very noised messages... This paper focuses on the development of generic tools and resources for e-mails classification. This study deals with the problems of the precise muting of e-mails. After a filtering and lemmatization step, vectorial representation of texts is used for classification purpose by means of supervised, semi-supervised and unsupervised learning techniques. Very good results are presented on realistic corpora.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call