Abstract

This paper reports on experiments in multi-class e-mail categorisation with supervised and unsupervised machine learning techniques. To this end, Support Vector Machines, decision tree learners, instance-based classifiers, Naive Bayes classification approaches and Self-Organising Maps were applied. A word-based and a character n-gram document representation approach were employed in order to assess the categorisation performance of the various learning approaches. The results indicate a substantial increase in classification accuracy when e-mail header information is considered in the document representation. To a much lesser degree, word-based document representations are advantageous over n-gram representations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call