A comparison of data preparation approaches for e-mail categorisation

Helmut Berger,Dieter Merkl,Michael Dittenbach

doi:10.1504/ijiids.2007.014946

A comparison of data preparation approaches for e-mail categorisation

Helmut Berger, Dieter Merkl + Show 1 more

https://doi.org/10.1504/ijiids.2007.014946

Copy DOI

Journal: International Journal of Intelligent Information and Database Systems

Publication Date: Jan 1, 2007

Affiliation: Polymer Competence Center Leoben (Austria), TU Wien

#Unsupervised Machine Learning Techniques #N-gram Representations + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

This paper reports on experiments in multi-class e-mail categorisation with supervised and unsupervised machine learning techniques. To this end, Support Vector Machines, decision tree learners, instance-based classifiers, Naive Bayes classification approaches and Self-Organising Maps were applied. A word-based and a character n-gram document representation approach were employed in order to assess the categorisation performance of the various learning approaches. The results indicate a substantial increase in classification accuracy when e-mail header information is considered in the document representation. To a much lesser degree, word-based document representations are advantageous over n-gram representations.

Full Text