Abstract
Phishing attacks increasingly use digital certificates to appear safe to users, and the frequency of such attacks has surged in recent years. As an example, around 80% of the 2021 phishing attacks used digital certificates to appear legitimate. The most common methods today for detecting phishing websites rely on users reporting the websites to phishing repositories, where they are then confirmed. This process can be slow, allowing the attacker to have time to have their phishing attack out on the Internet. Newer methods that implement machine learning models for the detection of phishing websites based on their digital certificate have been shown to be effective. This paper presents a system that uses certificate and domain name related features along with machine learning methods for the detection of phishing websites. To develop the system, data was collected from PhishTank and Tranco for domain names, and Censys was used for certificate retrieval. The domain related features are partly engineered using a time-series based deep learning model to get a vector representation of the domain name. Using the features engineered from the certificate and domain name, classical machine learning classifiers are trained and compared. Enriching the feature set with the vector representation of the domain names results in higher performance in distinguishing suspicious certificates from benign ones, going from an F1-score of 0.77 for a feature set solely based on certificate-related features to a performance of 0.89 with the enriched feature set. A time-based evaluation reflects the same performance with an F1-score of 0.88, which is an improvement compared to existing approaches to feature engineering.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.