Abstract

Phishing is a criminal act in which a Phisher creates almost identical website connections exploiting URL Lexical characteristics to dupe unsuspecting users into exposing sensitive information such as financial data, address, and other personal information. Phishers recently sought to trick security experts by masking malicious URLs with obfuscation techniques to make them appear legitimate. This action leads us to conclude that URL lexical features analysis approaches are absolute procedures, and extract analysis techniques are required. This research is a first step toward designing and developing a decision-making system that uses a combination of URL Lexical, and Network Traffic features to detect and classify malicious URLs rather than relying solely on Lexical or URL Network Traffic features. To achieve our goal, we examined and assessed the usage of URL Lexical and Network Traffic features to detect malicious URLs. In the study, three methodologies are used: Complete Features, KMO test as a features selection method, and PCA as a dimensionality method, which are tested by LR, SVM, and KNN classification algorithms and evaluated by the Confusion Matrix Accuracy measure. Using Network Traffics features (ISCXURL dataset), the W/O approach: LR, SVM, and KNN has 92%, 94%, and 93% accuracy. The KMO approach: SVM has 91% accuracy. The PCA approach: LR and SVM have 92% and 94% accuracy, surpassing the use of Lexical features (UCI dataset). In contrast, using Lexical features (UCI dataset), the KMO approach: LR and KNN has 90% and 94% accuracy. The PCA approach: KNN has 95% accuracy, surpassing the use of Network Traffic features (ISCXURL dataset). As a result, we are confident in proceeding with the next step of designing and developing a decision-making application that detects and classifies malicious URLs utilizing URL Lexical and Network Traffic features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call