Abstract
Phishing is one of the major challenges faced by the world of e-commerce today. Thanks to phishing attacks, billions of dollars have been lost by many companies and individuals. In 2012, an online report put the loss due to phishing attack at about $1.5 billion. This global impact of phishing attacks will continue to be on the increase and thus requires more efficient phishing detection techniques to curb the menace. This paper investigates and reports the use of random forest machine learning algorithm in classification of phishing attacks, with the major objective of developing an improved phishing email classifier with better prediction accuracy and fewer numbers of features. From a dataset consisting of 2000 phishing and ham emails, a set of prominent phishing email features (identified from the literature) were extracted and used by the machine learning algorithm with a resulting classification accuracy of 99.7% and low false negative (FN) and false positive (FP) rates.
Highlights
Phishing is one of the different types of fraud committed today
From a dataset consisting of 2000 phishing and ham emails, a set of prominent phishing email features were extracted and used by the machine learning algorithm with a resulting classification accuracy of 99.7% and low false negative (FN) and false positive (FP) rates
In 10-fold cross validation, the dataset is divided into 10 different parts; 9 of the 10 parts are used to train the classifier and the information gained from the training phase would be used to validate the 10th part; this is done 10 times, such that, at the end of the training and testing phase, each of the parts would have been used as both training and testing data
Summary
Phishing is one of the different (and lucrative) types of fraud committed today. In criminal law, fraud is defined as a deliberate deception made for the sole aim of personal gains or for smearing an individual’s image. Phishing attackers usually perpetrate their evil by communicating well composed messages (known as social engineered messages) to users in order to persuade them to reveal their personal information which will be used by the fraudster to gain unauthorized access to the user’s account. 2000 200 1800 the ability to handle existing phishing patterns, leaving email users prone to new phishing attacks This is a loop hole because fraudsters are not static in their activities; they change their mode of operation as often as possible to stay undetected. This motivated many researchers into seeking for other effective techniques that can handle both known and emerging fraud, and this led to the discovery of machine learning algorithms.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.