Abstract

There are many email filters that have been developed for classifying spam and phishing email. However, there is still a lack of phishing email filters developed because of the complexity of feature extraction and selection of the data. There are several categories of features for classifying phishing emails, either on the email part or on the human part. The absence of which features are best for helping to classify phishing emails is one of the challenges; in the previous experiment, there was no benchmark for the features to be used for phishing email classification. This research will provide new insight into the feature selection process in the phishing email classification area. Therefore, this work extracts the features based on the category and determines which features have the most impact on classifying email as phishing or not phishing using a machine learning approach. Feature selection is one of the essential parts of getting a good classification result. Therefore, obtaining the best features from email and human behavior will significantly impact phishing classification. This research collects the public phishing email dataset, extracts the features based on category using Python, and determines the feature importance using machine learning approaches with the PyCaret library. The dataset experimented on three different experiments in which each feature category was separated, and one experiment was the combined feature selection. Binary classification is also done with the extracted features. The experiment verified that the proposed method gave a good result in feature importance and the binary classification using selected features in terms of accuracy compared to previous research. The highest result obtained is the classification with combined features with 98% accuracy. The results obtained are better compared to previous studies. Hence, this research proves that the selected features will increase the performance of the classification.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.