Discrete mathematical models for enhancing cybersecurity : A mathematical and statistical analysis of machine learning approaches in phishing attack detection

Dinesh Goyal,Priya Mathur,Farhan Sheth,Amit Kumar Gupta

doi:10.47974/jdmsc-1893

Abstract

This paper presents a discreate mathematical modelling of cybersecurity phishing attack detection methodologies, emphasizing the crucial role of continual advancements in detection methods amidst the pervasive threat of phishing attacks in the cybersecurity landscape. Leveraging mathematical modeling and machine learning algorithms, the study employs three distinct datasets—Mendeley, URL tokenized, and a merged dataset integrating both. Multiple machine learning algorithms, including Logistic Regression, k-Nearest Neighbors, Support Vector Machines, Random Forest, Gradient Boosting Machines, Neural Networks, CatBoost, and XGBoost, are systematically applied to evaluate their efficacy. In the original Mendeley dataset, XGBoost achieves a top accuracy of 97.24%, along with CatBoost and Random Forest exceeding 97%. Post-preprocessing, CatBoost leads with an accuracy of 97.28%, showcasing superior precision, sensitivity, and F-score. Despite slight accuracy reductions post-preprocessing, models consistently achieve over 94% accuracy on the preprocessed Mendeley dataset, highlighting the substantial impact of preprocessing. Tokenized URLs exhibit comparatively lower performance, with the highest accuracy at 91.95%, emphasizing the challenges associated with this approach. The combined dataset proves optimal for most models, with XGBoost and SVM achieving the highest overall accuracy at 97.68%. SVM excels in sensitivity and specificity, while XGBoost excels in precision. The merged dataset significantly enhances accuracy, sensitivity, specificity, and precision, underscoring its pivotal role in refining predictive capabilities for identifying phishing websites. The results section provides a detailed overview of machine learning model performance on different datasets. CatBoost emerges as a standout performer on the preprocessed Mendeley dataset. The tokenized URLs offer valuable insights into associated challenges, and the combined dataset proves effective for various models. Confusion matrices, ROC curves, and Precision-Recall curves provide nuanced perspectives on model behavior, emphasizing the need for ongoing refinement and investigation into misclassification patterns to enhance model effectiveness in combating phishing threats.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Discrete mathematical models for enhancing cybersecurity : A mathematical and statistical analysis of machine learning approaches in phishing attack detection

Abstract

Talk to us

Similar Papers

More From: Journal of Discrete Mathematical Sciences and Cryptography

Lead the way for us

Journal: Journal of Discrete Mathematical Sciences and Cryptography	Publication Date: Jan 1, 2024
Citations: 1

Similar Papers

Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets.
Zhenxing Wu ... Yu Kang
Briefings in bioinformatics | VOL. 22
Zhenxing Wu, et. al.Zhenxing Wu ... Yu Kang
14 Dec 2020
Briefings in bioinformatics | VOL. 22

Review of Machine Learning Algorithms for Diagnosing Mental Illness
Gyeongcheol Cho ... Jinyeong Yim
Psychiatry Investigation | VOL. 16
Gyeongcheol Cho, et. al.Gyeongcheol Cho ... Jinyeong Yim
08 Apr 2019
Psychiatry Investigation | VOL. 16

Comparison of Machine Learning-based Approaches to Predict the Conversion to Alzheimer’s Disease from Mild Cognitive Impairment
Raffaella Franciotti ... Stefano L Sensi
Neuroscience | VOL. 514
Raffaella Franciotti, et. al.Raffaella Franciotti ... Stefano L Sensi
02 Feb 2023
Neuroscience | VOL. 514

Machine Learning Algorithms to Detect Sex in Myocardial Perfusion Imaging.
...
Frontiers in cardiovascular medicine | VOL. 8
, et. al. ...
29 Oct 2021
Frontiers in cardiovascular medicine | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Discrete mathematical models for enhancing cybersecurity : A mathematical and statistical analysis of machine learning approaches in phishing attack detection

Abstract

Talk to us

Similar Papers

More From: Journal of Discrete Mathematical Sciences and Cryptography