Phishing is a widespread cybercrime where scammers trick people into sharing personal or confidential information by pretending to be a legitimate website. Despite various machine learning methods developed for detecting phishing websites using features from web samples, not much attention has been given to picking the right features efficiently. This study aims to figure out the crucial features necessary for effective phishing detection, improving the accuracy and efficiency of machine learning systems. By identifying and prioritizing these features, our research contributes to creating simpler methods that keep users safe from phishing threats. We focus on pinpointing the key characteristics that consistently set phishing websites apart from legitimate ones, enhancing the precision and reliability of phishing detection algorithms through careful analysis and experimentation. This work aligns with broader goals of strengthening cybersecurity measures, protecting individuals and organizations from falling victim to online deception, and giving users more robust tools for secure online navigation.To evaluate the feature selection in developing ageneralizable phishing detection, these classifiers are trained by a separate out-of-sample data set of 14,000 website samples. The maximum F-measure gained feature selection is 95% using Random Forest classification. Also, there are 9 universal features selected over all the three data sets. The F-measure value using this universal feature set is approximately 93% which is a comparable result in contrast to performance. Since the universal feature set contains no features from third-part services, this finding implies that with no inquiry from external sources, we can gain a fast phishing detection which is also robust toward zero-day attacks
Read full abstract