Combining Lexical, Host, and Content-based features for Phishing Websites detection using Machine Learning Models

Samiya Hamadouche,Ouadjih Boudraa,Mohamed Gasmi

doi:10.4108/eetsis.4421

Abstract

In cybersecurity field, identifying and dealing with threats from malicious websites (phishing, spam, and drive-by downloads, for example) is a major concern for the community. Consequently, the need for effective detection methods has become a necessity. Recent advances in Machine Learning (ML) have renewed interest in its application to a variety of cybersecurity challenges. When it comes to detecting phishing URLs, machine learning relies on specific attributes, such as lexical, host, and content based features. The main objective of our work is to propose, implement and evaluate a solution for identifying phishing URLs based on a combination of these feature sets. This paper focuses on using a new balanced dataset, extracting useful features from it, and selecting the optimal features using different feature selection techniques to build and conduct acomparative performance evaluation of four ML models (SVM, Decision Tree, Random Forest, and XGBoost). Results showed that the XGBoost model outperformed the others models, with an accuracy of 95.70% and a false negatives rate of 1.94%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ICST Transactions on Scalable Information Systems	Publication Date: Apr 17, 2024
Citations: 1	License type: CC BY-NC-SA 4.0

R Discovery Prime

R Discovery Prime

Combining Lexical, Host, and Content-based features for Phishing Websites detection using Machine Learning Models

Abstract

Talk to us

Similar Papers

More From: ICST Transactions on Scalable Information Systems

Lead the way for us

Similar Papers

Perception without preconception: comparison between the human and machine learner in recognition of tissues from histological sections
Sanghita Barui ... K S Rajmohan
Scientific Reports | VOL. 12
Sanghita Barui, et. al.Sanghita Barui ... K S Rajmohan
30 Sep 2022
Scientific Reports | VOL. 12

State-of-the-Art Review of Machine Learning Models in Civil Engineering: Based on DAMIE Classification Tree
Jaehyun Kim ... Donghwi Jung
-
Jaehyun Kim, et. al.Jaehyun Kim ... Donghwi Jung
15 May 2023
15 May 2023

Seasonal Forecast of Non-monsoonal Winter Precipitation over the Eurasian Continent using Machine Learning Models
Qifeng Qian ... Ruizhi Zhang
Journal of Climate | VOL. -
Qifeng Qian, et. al.Qifeng Qian ... Ruizhi Zhang
08 Jun 2021
Journal of Climate | VOL. -

Machine Learning Models for Blood Glucose Level Prediction in Patients With Diabetes Mellitus: Systematic Review and Network Meta-Analysis.
Kui Liu ... Changsheng Chen
JMIR Medical Informatics | VOL. 11
Kui Liu, et. al.Kui Liu ... Changsheng Chen
20 Nov 2023
JMIR Medical Informatics | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Combining Lexical, Host, and Content-based features for Phishing Websites detection using Machine Learning Models

Abstract

Talk to us

Similar Papers

More From: ICST Transactions on Scalable Information Systems