Abstract
Phishing email attack is a dominant cyber-criminal strategy for decades. Despite its longevity, it has evolved during the COVID-19 pandemic, indicating that adversaries exploit critical situations to lure victims. Plenty of detectors have been proposed over the years, which mainly focus on the contents or the textual information of emails; however, to cope with the evolution of phishing emails more sophisticated approaches should be introduced that will exploit all the emails’ traits to enhance the detection capability of Machine Learning/Deep Learning classifiers. To tackle the limitations of existing works, this paper proposes a phishing email detection methodology, named HELPHED that focuses on the detection of phishing emails by combining Ensemble Learning methods with hybrid features. The hybrid features provide an accurate representation of emails by fusing their content and textual traits. We propose two methods of HELPHED, the first one employs the Stacking Ensemble Learning method, while the second method utilizes the Soft Voting Ensemble Learning. Both methods deploy two different Machine Learning algorithms to handle the hybrid features separately, yet in parallel, minimizing the features’ complexity and improving the model’s performance. A thorough evaluation analysis is carried out considering innovative guidelines that aim to prevent partial and misleading results. Experimental tests verified that the combination of hybrid features with Ensemble Learning, overall, accomplishes better detection performance than when employing only content-based or text-based features. Numerical results on a rich imbalanced dataset (i.e., 32,051 benign and 3,460 phishing email samples) that considers the evolution of phishing emails show that Soft Voting Ensemble Learning outperforms other prominent Machine Learning/Deep Learning algorithms and existing works yielding F1-score equal to 0.9942.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.