Abstract

This paper provides a comprehensive review and analysis of the detection of suspicious terrorist electronic mails (emails) using various phases and methods of text classification. We explored, analyzed, and compared different datasets, features, feature extraction techniques, feature representation techniques, feature selection schemes, text classification techniques, and performance measurement metrics used in the detection of suspicious terrorist e-mails. 30 articles were retrieved from 6 well-known academic databases after rigorous selection. From the study, we found that researchers often generate their own e-mails dataset since there is no public dataset is available in the research area of detecting suspicious terrorist e-mails. In most of the studies, researchers used content and context-based features to detect terrorist e-mails. Our findings also show that the most commonly used feature extraction techniques are the bag of words and n-gram, the most typically applied feature representation schemes are binary representation and term frequency, the most usually adopted feature selection method is information gain,, the most common and most accurate text classification algorithms are naive bayes, decision trees, and support vector machines, and the widely employed performance measurement metrics are accuracy, precision, and recall. Open research challenges and research issues that involve significant research efforts are also summarized in this review for future researchers in the area of suspicious terrorist e-mail detection using text classification techniques where the critical analysis presented in this paper also provides valuable insights to guide these researchers. Finally, the indicated issues and challenges presented in this paper can be used as future research directions in this area.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call