Abstract
Cybercriminals create phishing websites that mimic legitimate websites to get sensitive information from companies, individuals, or governments. Therefore, using state-of-the-art artificial intelligence and machine learning technologies to correctly classify phishing and legitimate URLs is imperative. We report the results of applying deterministic and probabilistic neural network models to URL classification. Key achievements of this work are: (1) The development of a unique approach based on probabilistic neural networks that improves classification accuracy. (2) We show for the first time in URL phishing research that a machine learning model trained on a combination of open source and private datasets is successful in production. The dataset is constructed from open sources like Alexa, PhishTank, or OpenPhish and, most importantly, real-world production data from EasyDMARC. The daily validation of the model using daily reported URL data and corresponding labels, both from open-source platforms and private production, reach on average a 97% accuracy on the validation dataset, labeled by PhishTank, OpenPhish and EasdDMARC where possible mislabeled data can not be excluded and was not possible to check due to large number of URLs. Feature engineering was done without third-party dependencies. Lastly, the evaluation of both deterministic and probabilistic models shows high accuracy on short and long URLs, where short URLs are defined as having less than 50 characters.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.