Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Improving the CMLM Algorithm by Reducing the High Rate of Indiscernibility Relations Among Website Phishing Objects

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Phishing is a common malicious cybercrime in which attackers trick people into revealing their sensitive information. Phishing attacks are serious, as they permit the attacker to perceive and steal the victim’s personal and sensitive information, such as credit card numbers and passwords, in a scam way while the victim is browsing the phishing website. Security experts are responsible for developing and continually improving their algorithms to save the community and guard its information. Traditional security techniques fail to do this effectively; experts are looking for new and robust methods. In this research, we use machine learning (ML) to improve phishing detection tasks. This technology efficiently influences the recognition of hidden patterns in large data inputs. The CMLM algorithm, used for detecting phishing website attacks, suffers from a high rate of indiscernibility relations, which reflect the lack of understanding of the outputs. Additionally, it does not account for imbalanced data. This paper proposes two new versions of the CMLM algorithm that effectively address these issues by integrating methods such as K-means clustering, Stability-correlation and correlation (ScC), rough set (RS) theory, principal component analysis (PCA), decision tree (DT), and deep learning (DL) in a controlled manner. The results show that the proposed methods demonstrate higher accuracy in detecting phishing websites than CMLM, achieving accuracies of 100%, 99.96%, and 99.81% across three key datasets. Compared to CMLM, the improvement margins are 0.47%, 3.08%, and 3.45% for DS1, DS2, and DS3, respectively.

Similar Papers
  • Research Article
  • Cite Count Icon 2
  • 10.37934/araset.48.2.197210
A Study on the Best Classification Method for an Intelligent Phishing Website Detection System
  • Jul 18, 2024
  • Journal of Advanced Research in Applied Sciences and Engineering Technology
  • Nor Hapiza Mohd Ariffin + 3 more

It is impossible to imagine our lives without the internet, but it has also meant that malicious acts such as phishing can be carried out anonymously. Phishers use social engineering or fake websites to trick their victims into giving them personal information such as credit card numbers, bank passwords and other sensitive information. However, the number of phishing attacks has increased significantly in the last year and current methods of detecting phishing are ineffective. This study focuses on identifying features of phishing websites, evaluating the best dataset and method for applying machine learning classification algorithms, and developing a prototype phishing detection system using the best classification algorithm model. In this study, the decision tree, logistic regression, and machine learning classification algorithm (k-nearest neighbours) were investigated. In this study, the waterfall methodology of system development life cycle (SDLC) was used. All approaches, strategies, tools and relevant theories were explored to provide an overview and understanding for this study. An extensive literature review was conducted to develop the model and problem statement. Data was collected through an open-source licenced website. In addition, the data was pre-processed before training and building the model to ensure that no noisy data was present. The parameters of the three models, K-nearest neighbours, decision tree and logistic regression, were adjusted to obtain the best possible model result. The models were then evaluated against the confusion matrix, accuracy, precision, recall, f1 score and decision tree to determine the best classification model for phishing and legitimate websites. The models are fine-tuned with the best parameters for each to achieve an optimal result for phishing detection. After evaluating each model, the decision trees were found to be the most accurate in classifying phishing websites with an accuracy of 95%. In the future, the system can be improved through different approaches such as Deep Learning and a fully developed web-based system that can be used in the real world.

  • Research Article
  • 10.37934/aaij.1.1.2033
A Study on the Best Classification Method for an Intelligent Phishing Website Detection System
  • Mar 15, 2025
  • ASEAN Artificial Intelligence Journal
  • Nor Hapiza Mohd Ariffin + 3 more

It is impossible to imagine our lives without the internet, but it has also meant that malicious acts such as phishing can be carried out anonymously. Phishers use social engineering or fake websites to trick their victims into giving them personal information such as credit card numbers, bank passwords and other sensitive information. However, the number of phishing attacks has increased significantly in the last year and current methods of detecting phishing are ineffective. This study focuses on identifying features of phishing websites, evaluating the best dataset and method for applying machine learning classification algorithms, and developing a prototype phishing detection system using the best classification algorithm model. In this study, the decision tree, logistic regression, and machine learning classification algorithm (k-nearest neighbours) were investigated. In this study, the waterfall methodology of system development life cycle (SDLC) was used. All approaches, strategies, tools and relevant theories were explored to provide an overview and understanding for this study. An extensive literature review was conducted to develop the model and problem statement. Data was collected through an open-source licenced website. In addition, the data was pre-processed before training and building the model to ensure that no noisy data was present. The parameters of the three models, K-nearest neighbours, decision tree and logistic regression, were adjusted to obtain the best possible model result. The models were then evaluated against the confusion matrix, accuracy, precision, recall, f1 score and decision tree to determine the best classification model for phishing and legitimate websites. The models are fine-tuned with the best parameters for each to achieve an optimal result for phishing detection. After evaluating each model, the decision trees were found to be the most accurate in classifying phishing websites with an accuracy of 95%. In the future, the system can be improved through different approaches such as Deep Learning and a fully developed web-based system that can be used in the real world.

  • Research Article
  • 10.55041/isjem05751
AI-Based Phishing Website Detection using Machine Learning
  • Mar 19, 2026
  • International Scientific Journal of Engineering and Management
  • Mohammed Iqbal, Y + 4 more

Phishing websites are a problem for cybersecurity because they pretend to be real online services to get sensitive information from users like login details and financial information. The people who make these phishing websites can. Shut them down quickly which makes it hard for traditional methods to detect them. To solve this problem this paper talks about a system that uses machine learning to detect phishing websites by looking at the website address and domain. This way the system can detect phishing websites without having to look at the webpage. The system uses a set of data with 11,055 website addresses and 31 special features to test how well it works. These features look at things like the structure of the website address, who registered the domain and things that might indicate phishing. The system uses two kinds of classifiers Logistic Regression and Random Forest to test how well it can detect phishing websites. The system also uses an ensemble model that combines the results from both classifiers to make the predictions more reliable. The results show that thisensemble model is really good at detecting phishing websites and can balance being precise and catching all the phishing websites. The system is also good at telling the difference between fake websites. The people who made the system also created a user interface that can analyze website addresses in time which makes it more useful and practical. The system is a solution for detecting phishing websites in the real world and can be used in many different situations. Phishing websites are a problem but this system can help solve it. The system uses machine learning and a special ensemble model to detect phishing websites. It is really good, at it. Phishing websites will continue to be a problem. With this system we can detect them more easily. Keywords: Phishing Website Detection, Machine Learning, URL-Based Feature Extraction, Cybersecurity, URL Classification, Random Forest Classifier, Logistic Regression, XGBoost Algorithm, Fusion Model, Soft Voting Ensemble, Attack Type Identification, Credential Phishing, Financial Fraud Detection, ROC Curve Analysis, AUC PerformanceMetric

  • Research Article
  • Cite Count Icon 5
  • 10.1111/exsy.13824
Phishing Website Detection: An In‐Depth Investigation of Feature Selection and Deep Learning
  • Jan 29, 2025
  • Expert Systems
  • Soudabe Mousavi + 1 more

ABSTRACTCloud and fog computing technologies benefit from integrating AI‐driven phishing detection as it enhances security, scalability, real‐time reaction, and privacy. Nowadays, there is a noticeable rise in illegal activity taking place online. One of the illicit cybersecurity practices is phishing, in which hackers trick consumers by pretending to be authentic websites and spoofing them to obtain sensitive user information. Phishing attacks, regrettably, have increased dramatically in recent years, according to research. Machine learning (ML) and deep learning (DL) techniques have shown encouraging progress in thwarting these attacks. Consequently, we employed DL and ML techniques to identify phishing websites in this study. This article presents four scenarios in both ML and DL models. Two are proposed in ML, while the others are employed in DL. The outcomes of four scenarios were contrasted to determine which algorithm performed better at distinguishing between legal and illicit websites. Many popular ML techniques were used, including K‐nearest neighbour, random forest (RF), decision trees, and SVMs. PCA and Importance Features are implemented in both ML scenarios to find the best features. RF successfully reached an accuracy of 97.82% using the Importance Feature technique. However, the PCA method failed to improve the performance of ML algorithms. As a result of ML‐based scenarios, 98 features are selected for the final deep learning scenarios. In DL‐based scenarios, algorithm architectures are essential to avoid overfitting and bias due to various hyperparameters. Thus, in the third scenario, our aim focuses on DL architecture design. Multilayer perceptron and convolutional neural networks (CNNs) are employed to detect phishing websites. Finally, our proposed 1D CNN model, using stratified k‐fold cross‐validation, outperformed the classical ML algorithm, achieving 98.94% accuracy and 0.99 AUC‐ROC score in detecting phishing websites.

  • Research Article
  • Cite Count Icon 14
  • 10.1088/1742-6596/1916/1/012169
Phishing website detection using machine learning and deep learning techniques
  • May 1, 2021
  • Journal of Physics: Conference Series
  • M Selvakumari + 3 more

Phishing has become more damaging nowadays because of the rapid growth of internet users. The phishing attack is now a big threat to people’s daily life and to the internet environment. In these attacks, the attacker impersonates a trusted entity intending to steal sensitive information or the digital identity of the user, e.g., account credentials, credit card numbers and other user details. A phishing website is a website which is similar in name and appearance to an official website otherwise known as a spoofed website which is created to fool an individual and steal their personal credentials. So, to identify the websites which are fraud, this paper will discuss the machine learning and deep learning algorithms and apply all these algorithms on our dataset and the best algorithm having the best precision and accuracy is selected for the phishing website detection. This work can provide more effective defenses for phishing attacks of the future.

  • Research Article
  • Cite Count Icon 3
  • 10.2174/0126662558323858240612064259
A Developed Model Based on Machine Learning Algorithms for Phishing Website Detection
  • Feb 1, 2025
  • Recent Advances in Computer Science and Communications
  • Hussein Abdel-Jaber + 2 more

Introduction: Users are accessing websites for many purposes, such as obtaining information about a particular topic, buying items, accessing their accounts, etc. Cybercriminals use phishing websites to attain the sensitive information of the users, like usernames and passwords, credit card details, etc. Detecting phishing websites helps in protecting the information and the money of people. Machine learning algorithms can be applied to detect phishing websites. Methods: In this paper, a model based on various machine learning algorithms is developed to detect phishing websites. The machine learning algorithms used in this model are Decision Tree, Random Forest, Extra Trees, K-Nearest Neighbors, Multilayer Perceptron and Support Vector Machine. The dataset of phishing websites is taken from the Kaggle website. The algorithms mentioned above of the developed model are compared together to identify which algorithm has better classification results. Results: The extra trees algorithm offers the best results for accuracy, precision, and F1- Score. This paper also compares the developed model with a previous model that uses the same dataset and relies upon decision tree, random forest, and support vector machine to determine which model has better classification report results. The developed model, depending on the Decision Tree and SVM, offers better classification results than those of the previous models. The developed model is compared with another preceding model relying upon Decision Tree and Random Forest algorithms to determine which model generates better results for accuracy, precision, recall/sensitivity, and F1-Score. Conclusion: The developed model, depending on the Decision Tree, presents better results for accuracy, recall, and F1-Score than the results of accuracy, sensitivity, and F1-Score for the preceding model based on the Decision Tree.

  • Research Article
  • Cite Count Icon 2
  • 10.59256/ijsreat.20250502011
Phishing Website Detection Based on URL Features
  • Apr 10, 2025
  • International Journal Of Scientific Research In Engineering & Technology
  • A Varun Kumar + 4 more

Phishing attacks are a significant threat to internet security, most commonly attacking users using spoofed websites. The study "Phishing Website Detection Based on URL Features" seeks to leverage machine learning algorithms for the detection of phishing sites through identifying specific URL features. The research determines the effectiveness of various feature selection techniques and demonstrates that the Random Forest classifier yields the highest accuracy rate of 98.23% with the lowest rate of false positive. Based on URL features, the proposed model aims to enhance detection capability, thereby providing an effective defense mechanism against phishing attacks. This approach not only returns to the field of cybersecurity but also offers practical solutions for safeguarding individuals and organizations against committing or falling victim to online fraud. "Phishing Website Detection Based on URL Features Using Deep Learning" discusses the application of advanced deep learning techniques to enhance the detection of phishing websites. This paper employs a full data set of phishing and regular URLs, and from them various features are extracted, including structural features and semantic properties of URLs. Employing a deep learning framework with Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), the model is trained to identify patterns that indicate phishing attacks. The results demonstrate an outstanding improvement in detection accuracy with more than 90% true positive rate and minimal false positives. The research demonstrates the strength of deep learning methods in combatting phishing attacks and represents a useful tool for safeguarding users from cyber deception. Phishing is a criminal technique used to deceive individuals into sharing confidential data, such as passwords and credit card numbers, by presenting itself as a trustworthy entity. Phishing website detection based on URL characteristics without relying on content analysis or blacklists is the current project. By examining structural, lexical, and statistical characteristics of URLs, the system predicts whether a website is genuine or phishing. The model employs machine learning algorithms to offer an efficient and scalable solution to counter phishing attacks, and artificial intelligence was employed for fake Prediction

  • Conference Article
  • Cite Count Icon 114
  • 10.1109/compsac.2019.10211
Detecting Phishing Websites through Deep Reinforcement Learning
  • Jul 1, 2019
  • Moitrayee Chatterjee + 1 more

Phishing is the simplest form of cybercrime with the objective of baiting people into giving away delicate information such as individually recognizable data, banking and credit card details, orev encredentials and pass words. This type of simple yet most effective cyber-attack is usually launched through emails, phone calls, or instant messages. The credential or private data stolen are then used to get access to critical records of the victims and can result in extensive fraud and monetary loss. Hence, sending malicious messages to victims is a stepping stone of the phishing procedure. A phisher usually setups a deceptive website, where the victims are conned into entering credentials and sensitive information. It is therefore important to detect these types of malicious websites before causing any harmful damages to victims. Inspired by the evolving nature of the phishing websites, this paper introduces a novel approach based on deep reinforcement learning to model and detect malicious URLs. The proposed model is capable of adapting to the dynamic behavior of the phishing websites and thus learn the features associated with phishing website detection.

  • Book Chapter
  • Cite Count Icon 3
  • 10.1007/978-3-031-24475-9_61
Phishing Website Detection with and Without Proper Feature Selection Techniques: Machine Learning Approach
  • Jan 1, 2023
  • Kibreab Adane + 1 more

Indeed, successful phishing website attempts could result in catastrophic data loss, login credential compromise, ransomware infection, and financial loss. It also significantly hampers the competitiveness and productivity of online users and internet-dependent organizations unless an intelligent anti-phishing solution is devised. Due to detecting fresh phishing website attacks with maximum accuracy by discovering hidden patterns from complex datasets is shown to be an intrinsic property of M- Learning approaches, the study conducted rigorous experiments on four purposely selected efficient supervised M-Learning algorithms before and after applying five widely used proper feature selection techniques such as Recursive Feature Elimination, Pierson Correlation Coefficient, Principal Component Analysis, Uni-variate Feature Selection, and Mutual Information. The proposed study was conducted to balance the research gaps and scientific disputes in the rigorously reviewed studies. The study’s final outcome is a proposal for an intelligent phishing website model that yields higher accuracy, faster response times, and fewer average misclassification rates. The study also explored the feature selection techniques that had more, less, and no contributions to enhancing each classifier's accuracy. As compared to the remaining classifiers, the Cat-Boost Classifier attained superior phishing website detection accuracy (97.46%), F1-score (97.49%), a lower average misclassification rate (2.54%), and acceptable train-test computational time (7 s) after using the UFS technique. On the other hand, the PCA technique failed to enhance the accuracy of the Cat-Boost, Gradient-Boost, and Random Forest Classifiers due to scoring less accuracy than the accuracy reached before using proper feature selection techniques. To obtain more promising results, in future work, phishing website detection is expected to be carried out using a Hybrid proper feature selection technique, huge datasets, proper deep learning algorithms, and proper model hyper-parameters.KeywordsCat-BoostMachine learningPhishing website detectionPCAUFSRFEMIPCCFeature selection technique

  • Research Article
  • 10.54365/adyumbd.1752606
A Novel DEA-ELM Hybrid Method for Web Phishing Detection
  • Dec 24, 2025
  • Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi
  • Yasin Sönmez + 1 more

Phishing attacks are a pervasive cybersecurity threat, using deceptive web pages to steal users' sensitive information. Detecting phishing sites with high precision and efficiency is crucial for building effective countermeasures. In this study, we propose a novel classification model that integrates a Differential Evolution Algorithms (DEA) with Extreme Learning Machines (ELM) framework for phishing website detection. The approach introduces a DEA mechanism for inter-feature signal enhancement and couples it with an ELM, optimized through a DEA. The proposed DEA-ELM model was evaluated on the Web Page Phishing Detection dataset, Compared to traditional machine learning models such as Random Forest, Logistic Regression, Support Vector Machine (SVM), and Decision Tree, which achieved accuracies between 93% and 97%, the proposed DEA-ELM model achieved a remarkable 99.86% accuracy, along with high precision, recall, and F1-score metrics. These results confirm the potential of DEA-optimized ELM combined with DEA analysis in creating scalable, accurate, and real-time phishing detection systems. The model also provides a reproducible framework by using publicly available data and open-source feature extraction scripts. Future work may explore hybrid feature selection strategies, larger-scale deployment, and online learning extensions.

  • Conference Article
  • Cite Count Icon 46
  • 10.1109/icais50930.2021.9395810
Phishing website detection using novel machine learning fusion approach
  • Mar 25, 2021
  • A Lakshmanarao + 2 more

The Phishing is a sort of social designing assault regularly used to take client information, including login accreditations and credit card numbers. With the enhancements in internet technology, websites are the major resource for the cyber-attacks. There are several counter measures available for avoiding phishing attacks, but phishers are changing their attacking methods from time to time. One of the most widely used techniques for solving cybersecurity issues is machine learning. From last several years, Machine Learning and Deep Learning Techniques are suitable for solving security related issues. Machine Learning is most suitable for detecting phishing attacks because most of the phishing attacks have common characteristics. This paper has applied several machine learning techniques for detecting the phishing attacks. Here, two prioritybased algorithms are proposed. Based on the results of these algorithms, the final fusion classifier is decided. We used a dataset from UCI and applied a novel fusion classifier and achieved an accuracy of 97%. We used Python for implementing our experiments.

  • Research Article
  • Cite Count Icon 1
  • 10.11591/ijeecs.v36.i2.pp1273-1283
Phishing website detection using novel integration of BERT and XLNet with deep learning sequential models
  • Nov 1, 2024
  • Indonesian Journal of Electrical Engineering and Computer Science
  • Kongara Srinivasa Rao + 5 more

Phishing websites pose a significant threat to online security, necessitating robust detection mechanisms to safeguard users' sensitive information. This study explores the efficacy of various deep learning architectures for phishing website detection. Initially, traditional sequential models, including recurrent neural networks (RNN), long short-term memory (LSTM), and gated recurrent unit (GRU), achieve accuracies of 95%, 96%, and 96.5%, respectively, on a curated dataset. Building upon these results, hybrid architectures that combine the strengths of traditional sequential models with state-of-the-art language representation models, bidirectional encoder representations from transformers (BERT) and XLNet, are investigated. Combinations such as RNN with BERT, BERT with LSTM, BERT with GRU, RNN with XLNet, XLNet with LSTM, and XLNet with GRU are evaluated. Through experimentation, accuracies of 94.5%, 96.5%, 96.1%, 95.7%, 97.4%, and 97%, respectively, are achieved, demonstrating the effectiveness of hybrid deep learning architectures in enhancing phishing detection performance. These findings contribute to advancing the state-of-the-art in cybersecurity practices and underscore the importance of leveraging diverse model types to combat online threats effectively.

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/transai54797.2022.00025
Characterizing Coding Style of Phishing Websites Using Machine Learning Techniques
  • Sep 1, 2022
  • May Almousa + 2 more

Social engineering attacks pose a major threat to an internet user’s sensitive information, such as credit card information and passwords. One of the most common of these attacks are phishing websites. These websites appear to be legitimate in hopes that a user will unknowingly input their sensitive information to the malicious website. This paper attempts to identify and characterize the coding style of phishing websites using machine learning models. We used web scraping to extract the HTML content of around 29,000 phishing websites. The phishing websites were collected from PhishTank, which publicly tracks such websites. To compare the HTML coding styles and syntax in phishing websites and legitimate websites, we used a dataset of around 36,000 legitimate websites. We eliminated websites with missing basic content. From the cleaned datasets of phishing and legitimate websites, we processed 10,800 websites’ source codes (5,400 websites per category), extracting 11 features from every website’s content. Our Random Forest model achieved the best accuracy of 94.16% in detecting phishing websites.

  • Research Article
  • Cite Count Icon 25
  • 10.1016/j.asej.2024.102643
Detecting phishing websites through improving convolutional neural networks with Self-Attention mechanism
  • Jan 22, 2024
  • Ain Shams Engineering Journal
  • Yahia Said + 3 more

Detecting phishing websites through improving convolutional neural networks with Self-Attention mechanism

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 81
  • 10.14569/ijacsa.2017.080910
Phishing Website Detection based on Supervised Machine Learning with Wrapper Features Selection
  • Jan 1, 2017
  • International Journal of Advanced Computer Science and Applications
  • Waleed Ali

The problem of Web phishing attacks has grown considerably in recent years and phishing is considered as one of the most dangerous Web crimes, which may cause tremendous and negative effects on online business. In a Web phishing attack, the phisher creates a forged or phishing website to deceive Web users in order to obtain their sensitive financial and personal information. Several conventional techniques for detecting phishing website have been suggested to cope with this problem. However, detecting phishing websites is a challenging task, as most of these techniques are not able to make an accurate decision dynamically as to whether the new website is phishing or legitimate. This paper presents a methodology for phishing website detection based on machine learning classifiers with a wrapper features selection method. In this paper, some common supervised machine learning techniques are applied with effective and significant features selected using the wrapper features selection approach to accurately detect phishing websites. The experimental results demonstrated that the performance of the machine learning classifiers was improved by using the wrapper-based features selection. Moreover, the machine learning classifiers with the wrapper-based features selection outperformed the machine learning classifiers with other features selection methods.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant