Feature Extraction or Feature Selection for Text Classification: A Case Study on Phishing Email Detection

Masoumeh Zareapoor,Seeja K R

doi:10.5815/ijieeb.2015.02.08

Abstract

Dimensionality reduction is generally performed when high dimensional data like text are classified. This can be done either by using feature extraction techniques or by using feature selection techniques. This paper analyses which dimension reduction technique is better for classifying text data like emails. Email classification is difficult due to its high dimensional sparse features that affect the generalization performance of classifiers. In phishing email detection, dimensionality reduction techniques are used to keep the most instructive and discriminative features from a collection of emails, consists of both phishing and legitimate, for better detection. Two feature selection techniques - Chi-Square and Information Gain Ratio and two feature extraction techniques - Principal Component Analysis and Latent Semantic Analysis are used for the analysis. It is found that feature extraction techniques offer better performance for the classification, give stable classification results with the different number of features chosen, and robustly keep the performance over time.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Feature Extraction or Feature Selection for Text Classification: A Case Study on Phishing Email Detection

Abstract

Talk to us

Similar Papers

More From: International Journal of Information Engineering and Electronic Business

Lead the way for us

Journal: International Journal of Information Engineering and Electronic Business	Publication Date: Mar 8, 2015
Citations: 73

Similar Papers

A Comparison of Feature Selection and Feature Extraction Techniques for Condition Monitoring of a Hydraulic Actuator
Ryan Meekins ... Qing Dong
Annual Conference of the PHM Society | VOL. 9
Ryan Meekins, et. al.Ryan Meekins ... Qing Dong
02 Oct 2017
Annual Conference of the PHM Society | VOL. 9

Risk Analysis in Electronic Payments and Settlement System Using Dimensionality Reduction Techniques
B Emil Richard Singh ... E Sivasankar
-
B Emil Richard Singh, et. al.B Emil Richard Singh ... E Sivasankar
01 Jan 2018
01 Jan 2018

A Review of Dimensionality Reduction Techniques for Efficient Computation
S Iwin Thankumar Joseph ... S Velliangiri
Procedia Computer Science | VOL. 165
S Iwin Thankumar Joseph, et. al.S Iwin Thankumar Joseph ... S Velliangiri
01 Jan 2019
Procedia Computer Science | VOL. 165

An efficient framework for heart disease classification using feature extraction and feature selection technique in data mining
E Kannan ... R Kavitha
-
E Kannan, et. al.E Kannan ... R Kavitha
01 Feb 2016
01 Feb 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Feature Extraction or Feature Selection for Text Classification: A Case Study on Phishing Email Detection

Abstract

Talk to us

Similar Papers

More From: International Journal of Information Engineering and Electronic Business