Improving Suspicious URL Detection through Ensemble Machine Learning Techniques

Sanjukta Mohanty,Arup Abhinna Achary,Laki Sahu

doi:10.1201/9781003184140-13

Abstract

In this chapter, we design a framework for detecting suspicious URLs by considering the URL features without requiring the content of the web page. We distinguish malignant web pages based on different discriminative and effective URL features, including lexical features, HTTP header information–based features, host-based features, geographical features and network features whose predictive power is high and improves performance significantly. Moreover, our approach uses both batch machine learning (ML) algorithms and ensemble machine learning classifiers (EMLCs) to identify the suspicious URL. EMLCs use multiple weak learners that are trained on different training examples to enhance the model performance effectively (TRAGHA 2019). We have compared the batch ML algorithms with ensemble models experimentally and ascertained that the ensemble approach outperforms the batch ML classifiers. We have extended our previous approach (Mohanty et al. 2020), where we used only some batch learning classifiers and a few URL features to identify URLs as either malignant or safe and obtained the classifier; the random forest (RF) model achieved the highest accuracy at 95%. Our proposed approach is evaluated against a training dataset that contains some safe and some malicious URLs and shows that the ensemble techniques obtain a TPR (true positive rate) of 0.98, FPR (false positive rate) of 0.01 and accuracy of 98.66%, precision of 0.95, recall of 100%, F1 score of 96% and AUC of 0.982.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving Suspicious URL Detection through Ensemble Machine Learning Techniques

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Critical Review of Stack Ensemble Classifier for the Prediction of Young Adults’ Voting Patterns Based on Parents’ Political Affiliations
Godwin Elo ... Benjamin Ghansah
Informing Science: The International Journal of an Emerging Transdiscipline | VOL. 27
Godwin Elo, et. al.Godwin Elo ... Benjamin Ghansah
01 Jan 2024
Informing Science: The International Journal of an Emerging Transdiscipline | VOL. 27

Autism Spectrum Disorder Diagnosis Using Ensemble ML and Max Voting Techniques
A Arunkumar ... D Surendran
Computer Systems Science and Engineering | VOL. 41
A Arunkumar, et. al.A Arunkumar ... D Surendran
01 Jan 2021
Computer Systems Science and Engineering | VOL. 41

Determining the Geotechnical Slope Failure Factors via Ensemble and Individual Machine Learning Techniques: A Case Study in Mandi, India
Naresh Mali ... K V Uday
Frontiers in Earth Science | VOL. 9
Naresh Mali, et. al.Naresh Mali ... K V Uday
15 Sep 2021
Frontiers in Earth Science | VOL. 9

EEG rhythm based emotion recognition using multivariate decomposition and ensemble machine learning classifier
Raveendrababu Vempati ... Lakhan Dev Sharma
Journal of Neuroscience Methods | VOL. 393
Raveendrababu Vempati, et. al.Raveendrababu Vempati ... Lakhan Dev Sharma
12 May 2023
Journal of Neuroscience Methods | VOL. 393

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving Suspicious URL Detection through Ensemble Machine Learning Techniques

Abstract

Talk to us

Similar Papers