An Outlier Detection Approach on Credit Card Fraud Detection Using Machine Learning: A Comparative Analysis on Supervised and Unsupervised Learning
Credit card fraud is a socially relevant problem that majorly faces a lot of ethical issues and poses a great threat to businesses all around the world. In order to detect fraudulent transactions made by the wrongdoer, machine learning algorithms are applied. The purpose of this paper is to identify the best-suited algorithm which accurately finds out fraud or outliers using supervised and unsupervised machine learning algorithms. The challenge lies in identifying and understanding them accurately. In this paper, an outlier detection approach is put forward to resolve this issue using supervised and unsupervised machine learning algorithms. The effectiveness of four different algorithms, namely local outlier factor, isolation forest, support vector machine, and logistic regression, is measured by obtaining scores of evaluation metrics such as accuracy, precision, recall score, F1-score, support, and confusion matrix along with three different averages such as micro, macro, and weighted averages. The implementation of local outlier factor provides an accuracy of 99.7 and isolation forest provides an accuracy of 99.6 under supervised learning. Similary in unsupervised learning, implementation of support vector machine provides an accuracy of 97.2 and logistic regression provides an accuracy of 99.8. Based on the experimental analysis, both the algorithms used in unsupervised machine learning acquire a high accuracy. An overall good, as well as a balanced performance, is achieved in the evaluation metrics scores of unsupervised learning. Hence, it is concluded that the implementation of unsupervised machine learning algorithms is relatively more suitable for practical applications of fraud and spam identification.
288
- 10.1016/j.procs.2015.04.201
- Jan 1, 2015
- Procedia Computer Science
5694
- 10.1007/3-540-45014-9_1
- Jan 1, 2000
252
- 10.1109/icnsc.2004.1297040
- Mar 21, 2004
- Research Article
7
- 10.1007/s11042-023-17828-y
- Dec 16, 2023
- Multimedia Tools and Applications
Unleashing the power of explainable AI: sepsis sentinel's clinical assistant for early sepsis identification
- Book Chapter
- 10.1007/978-3-032-06069-3_26
- Oct 8, 2025
Towards Semi-supervised Subspace Learning for Outlier Detection in Big Data
- Book Chapter
24
- 10.5772/intechopen.94217
- Jun 9, 2021
A volcano is a complex system, and the characterization of its state at any given time is not an easy task. Monitoring data can be used to estimate the probability of an unrest and/or an eruption episode. These can include seismic, magnetic, electromagnetic, deformation, infrasonic, thermal, geochemical data or, in an ideal situation, a combination of them. Merging data of different origins is a non-trivial task, and often even extracting few relevant and information-rich parameters from a homogeneous time series is already challenging. The key to the characterization of volcanic regimes is in fact a process of data reduction that should produce a relatively small vector of features. The next step is the interpretation of the resulting features, through the recognition of similar vectors and for example, their association to a given state of the volcano. This can lead in turn to highlight possible precursors of unrests and eruptions. This final step can benefit from the application of machine learning techniques, that are able to process big data in an efficient way. Other applications of machine learning in volcanology include the analysis and classification of geological, geochemical and petrological “static” data to infer for example, the possible source and mechanism of observed deposits, the analysis of satellite imagery to quickly classify vast regions difficult to investigate on the ground or, again, to detect changes that could indicate an unrest. Moreover, the use of machine learning is gaining importance in other areas of volcanology, not only for monitoring purposes but for differentiating particular geochemical patterns, stratigraphic issues, differentiating morphological patterns of volcanic edifices, or to assess spatial distribution of volcanoes. Machine learning is helpful in the discrimination of magmatic complexes, in distinguishing tectonic settings of volcanic rocks, in the evaluation of correlations of volcanic units, being particularly helpful in tephrochronology, etc. In this chapter we will review the relevant methods and results published in the last decades using machine learning in volcanology, both with respect to the choice of the optimal feature vectors and to their subsequent classification, taking into account both the unsupervised and the supervised approaches.
- Book Chapter
4
- 10.1007/978-981-16-8248-3_40
- Jan 1, 2022
Abstract In today’s world, people are more inclined towards online shopping and payment, which is leading to an increase in the number of credit card users worldwide. And in the result of that, fraudsters are also finding more opportunities for fraud activities. The credit card companies face a huge loss if these frauds go untraced. There is a need to have an efficient credit card fraud detection system that can detect these frauds and can give warning to the banks to avoid the fraud from happening. Many researchers have proposed models to solve this problem. These models use Data Science, Machine Learning or Deep Learning algorithms or their combinations to detect credit card frauds. This paper provides a comprehensive review of various fraud detection techniques used in the detection models proposed by many researchers, datasets used in their research work and the various evaluation parameters that are used by them for the performance evaluation of their models. And also discuss the challenges faced in fraud detection process.KeywordsFraud detectionData scienceMachine learningDeep learning
- Research Article
1
- 10.2308/isys-2022-026
- Jun 14, 2024
- Journal of Information Systems
ABSTRACT Auditors traditionally use sampling techniques to examine general ledger (GL) data, which suffer from sampling risks. Hence, recent research proposes full-population testing techniques, such as suspicion scoring, which rely on auditors’ judgment to recognize possible risk factors and develop corresponding risk filters to identify abnormal transactions. Thus, when auditors miss potential problems, the related transactions are not likely to be identified. This paper uses unsupervised outlier detection methods, which require no prior knowledge about outliers in a dataset, to identify outliers in GL data and tests whether auditors can gain new insights from those identified outliers. A framework called the Multilevel Outlier Detection Framework (MODF) is proposed to identify outliers at the transaction level, account level, and combination-by-variable level. Experiments with one real and one synthetic GL dataset demonstrate that the MODF can help auditors to gain new insights about GL data. Data Availability: The real dataset used in the experiment is not publicly available due to privacy policies. JEL Classifications: M410, M42.
- Book Chapter
- 10.1007/978-3-031-56998-2_7
- Jan 1, 2024
Detection of Malicious Activity on Credit Cards Using Machine Learning
- Research Article
17
- 10.1016/j.heliyon.2024.e25466
- Feb 1, 2024
- Heliyon
A soft voting ensemble learning approach for credit card fraud detection
- Book Chapter
6
- 10.1007/978-981-19-0095-2_23
- Jun 23, 2022
Abstract Credit card usage has increased significantly as a result of the fast development of e-commerce and the Internet. As a consequence of enhanced credit card usage, credit card theft has risen substantially in recent years. Fraud in the financial sector is expected to have far-reaching effects in the near future. As a response, numerous scholars are concerned with financial fraud detection and prevention. In order to prevent bothering innocent consumers while detecting fraud, accuracy has become critical. We used hyperparameter optimization to see if created models utilizing different machine learning approaches are significantly the same or different, and if resampling strategies improve the suggested models’ performance. The hyperparameter is optimized using GridSearchCV techniques. To test the hypotheses of data that has been divided into training and test data, the GridSearchCV and random search methods are used. The maximum accuracy 72.1% was achieved by decision tree classifier on the imbalanced German credit card dataset. The maximum accuracy of 98.6% is achieved by LDA on imbalanced European credit card dataset. Additionally, logistic regression and naïve Bayes were also tested and SMOTE was applied.KeywordsDecision treeLDAGaussian Naïve BayesLogistic regressionBernoulli Naïve BayesCredit cardGridSearchCV
- Research Article
9
- 10.2478/jaiscr-2023-0001
- Nov 28, 2022
- Journal of Artificial Intelligence and Soft Computing Research
Abstract Outlier detection aims to find a data sample that is significantly different from other data samples. Various outlier detection methods have been proposed and have been shown to be able to detect anomalies in many practical problems. However, in high dimensional data, conventional outlier detection methods often behave unexpectedly due to a phenomenon called the curse of dimensionality. In this paper, we compare and analyze outlier detection performance in various experimental settings, focusing on text data with dimensions typically in the tens of thousands. Experimental setups were simulated to compare the performance of outlier detection methods in unsupervised versus semi-supervised mode and uni-modal versus multi-modal data distributions. The performance of outlier detection methods based on dimension reduction is compared, and a discussion on using k-NN distance in high dimensional data is also provided. Analysis through experimental comparison in various environments can provide insights into the application of outlier detection methods in high dimensional data.
- Research Article
1
- 10.1109/tps-isa56441.2022.00028
- Dec 1, 2022
- ... IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications : (TPS-ISA ...). IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications
Outlier detection is a fundamental data analytics technique often used for many security applications. Numerous outlier detection techniques exist, and in most cases are used to directly identify outliers without any interaction. Typically the underlying data used is often high dimensional and complex. Even though outliers may be identified, since humans can easily grasp low dimensional spaces, it is difficult for a security expert to understand/visualize why a particular event or record has been identified as an outlier. In this paper we study the extent to which outlier detection techniques work in smaller dimensions and how well dimensional reduction techniques still enable accurate detection of outliers. This can help us to understand the extent to which data can be visualized while still retaining the intrinsic outlyingness of the outliers.
- Research Article
8
- 10.1016/j.sciaf.2024.e02386
- Sep 19, 2024
- Scientific African
Anomaly detection using unsupervised machine learning algorithms: A simulation study
- Research Article
2
- 10.1016/j.chemer.2024.126209
- Nov 1, 2024
- Geochemistry
Identifying geochemical anomalies associated with tungsten polymetallic mineralization using geographical detector and unsupervised machine learning methods: Application to the Nanling metallogenic belt, China
- Research Article
20
- 10.1109/embc46164.2021.9630535
- Nov 1, 2021
- Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
Artifact detection and removal is a crucial step in all data preprocessing pipelines for physiological time series data, especially when collected outside of controlled experimental settings. The fact that such artifact is often readily identifiable by eye suggests that unsupervised machine learning algorithms may be a promising option that do not require manually labeled training datasets. Existing methods are often heuristic-based, not generalizable, or developed for controlled experimental settings with less artifact. In this study, we test the ability of three such unsupervised learning algorithms, isolation forests, 1-class support vector machine, and K-nearest neighbor distance, to remove heavy cautery-related artifact from electrodermal activity (EDA) data collected while six subjects underwent surgery. We first defined 12 features for each halfsecond window as inputs to the unsupervised learning methods. For each subject, we compared the best performing unsupervised learning method to four other existing methods for EDA artifact removal. For all six subjects, the unsupervised learning method was the only one successful at fully removing the artifact. This approach can easily be expanded to other modalities of physiological data in complex settings.Clinical Relevance- Robust artifact detection methods allow for the use of diverse physiological data even in complex clinical settings to inform diagnostic and therapeutic decisions.
- Book Chapter
26
- 10.1016/b978-0-12-821929-4.00002-0
- Jan 1, 2021
- Machine Learning Guide for Oil and Gas Using Python
Chapter 4 - Unsupervised machine learning: clustering algorithms
- Conference Article
3
- 10.4043/31297-ms
- Aug 9, 2021
Detection of anomalous events in practical operation of oil and gas (O&G) wells and lines can help to avoid production losses, environmental disasters, and human fatalities, besides decreasing maintenance costs. Supervised machine learning algorithms have been successful to detect, diagnose, and forecast anomalous events in O&G industry. Nevertheless, these algorithms need a large quantity of annotated dataset and labelling data in real world scenarios is typically unfeasible because of exhaustive work of experts. Therefore, as unsupervised machine learning does not require an annotated dataset, this paper intends to perform a comparative evaluation performance of unsupervised learning algorithms to support experts for anomaly detection and pattern recognition in multivariate time-series data. So, the goal is to allow experts to analyze a small set of patterns and label them, instead of analyzing large datasets. This paper used the public 3W database of three offshore naturally flowing wells. The experiment used real data of production of O&G from underground reservoirs with the following anomalous events: (i) spurious closure of Downhole Safety Valve (DHSV) and (ii) quick restriction in Production Choke (PCK). Six unsupervised machine learning algorithms were assessed: Cluster-based Algorithm for Anomaly Detection in Time Series Using Mahalanobis Distance (C-AMDATS), Luminol Bitmap, SAX-REPEAT, k-NN, Bootstrap, and Robust Random Cut Forest (RRCF). The comparison evaluation of unsupervised learning algorithms was performed using a set of metrics: accuracy (ACC), precision (PR), recall (REC), specificity (SP), F1-Score (F1), Area Under the Receiver Operating Characteristic Curve (AUC-ROC), and Area Under the Precision-Recall Curve (AUC-PRC). The experiments only used the data labels for assessment purposes. The results revealed that unsupervised learning successfully detected the patterns of interest in multivariate data without prior annotation, with emphasis on the C-AMDATS algorithm. Thus, unsupervised learning can leverage supervised models through the support given to data annotation.
- Research Article
6
- 10.1016/j.trpro.2022.02.048
- Jan 1, 2022
- Transportation Research Procedia
Benchmarking machine learning algorithms by inferring transportation modes from unlabeled GPS data
- Research Article
- 10.22214/ijraset.2022.42974
- May 31, 2022
- International Journal for Research in Applied Science and Engineering Technology
Abstract: We are living in the technology world in which Development of communication and e-commerce has made credit card as the most common technique of payment for both online and offline mode of purchases. As the e-commerce has increased, the buying and selling the product online is becoming very easy and comfortable to everyone in the daily life. Due to this, the online payment and online banking with credit card is increased. The fraud also happens when we lost our credit card or it get stole. Recently due to COVID-19 everything has become contactless so the use of credit card has increased. The transaction is done on online shopping and online payment is done from many places so it become difficult to recognise the real transaction and the fraud transaction. So, it becomes difficulty for the bank to stop fraud detection. In this paper it clearly explains about how fraud can be detected by using the Unsupervised Machine Learning using the algorithm and by using the algorithm technique like Isolated Forest, Local Outlier Factor and One class SVM. Keywords: Introduction, Machine learning, Supervised learning, Unsupervised learning
- Research Article
45
- 10.1016/j.apgeochem.2020.104679
- Jul 11, 2020
- Applied Geochemistry
Identification of multi-element geochemical anomalies using unsupervised machine learning algorithms: A case study from Ag–Pb–Zn deposits in north-western Zhejiang, China
- Conference Article
70
- 10.1109/confluence.2019.8776925
- Jan 1, 2019
Credit card transactions have become common place today and so is the frauds associated with it. One of the most common modus operandi to carry out fraud is to obtain the card information illegally and use it to make online purchases. For credit card companies and merchants, it is in-feasible to detect these fraudulent transactions among thousands of normal transactions. If sufficient data is collected and made available, machine learning algorithms can be applied to solve this problem. In this work, popular supervised and unsupervised machine learning algorithms have been applied to detect credit card frauds in a highly imbalanced dataset. It was found that unsupervised machine learning algorithms can handle the skewness and give best classification results.
- Book Chapter
2
- 10.5772/intechopen.94944
- May 18, 2022
The development of artificial intelligence (AI) algorithms for classification purpose of undesirable events has gained notoriety in the industrial world. Nevertheless, for AI algorithm training is necessary to have labeled data to identify the normal and anomalous operating conditions of the system. However, labeled data is scarce or nonexistent, as it requires a herculean effort to the specialists of labeling them. Thus, this chapter provides a comparison performance of six unsupervised Machine Learning (ML) algorithms to pattern recognition in multivariate time series data. The algorithms can identify patterns to assist in semiautomatic way the data annotating process for, subsequentially, leverage the training of AI supervised models. To verify the performance of the unsupervised ML algorithms to detect interest/anomaly pattern in real time series data, six algorithms were applied in following two identical cases (i) meteorological data from a hurricane season and (ii) monitoring data from dynamic machinery for predictive maintenance purposes. The performance evaluation was investigated with seven threshold indicators: accuracy, precision, recall, specificity, F1-Score, AUC-ROC and AUC-PRC. The results suggest that algorithms with multivariate approach can be successfully applied in the detection of anomalies in multivariate time series data.
- Conference Article
12
- 10.1109/ims37962.2022.9865441
- Jun 19, 2022
In this paper, a novel unsupervised machine learning (ML) algorithm is presented for the expeditious RF fingerprinting of LoRa modulated chirps. Identification based on received signal strength indicator (RSSI) alone is unlikely to yield a robust means for sensor authentication within critical infrastructure deployment. Here, an unsupervised ML algorithm is used to rapidly train an artificial neural network (ANN) matrix creating self-organizing maps (SOMs) for each authentic transmitter and a potential rogue node. A general classifier can be trained on the SOMs for precisely profiling each transmitter as either genuine or rogue. By means of experimental validation, this methodology demonstrated cent-percent success in recognizing each transmitter, either being a real or a rogue node.
- Research Article
- 10.55041/ijsrem35539
- Jun 6, 2024
- INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
Data anomaly detection is crucial for maintaining the integrity, security, and efficiency of systems, helping to detect and respond to abnormal events promptly. In this paper, a hybrid approach of a combination of two unsupervised machine learning algorithms, Isolation Forest (IF) and One Class Support Vector Machine (OCSVM) with one supervised machine learning algorithm Random Forest Classifier as an ensemble method is used in credit card transactions system for data anomaly detection. For this, credit card fraud detection dataset from Kaggle is used for anomaly detection. As credit card transactions data is huge, varying and unlabeled, unsupervised algorithms like IF and OCSVM become suitable for it. The ability of OCSVM of defining of what constitutes the anomaly, power of isolation forest to detect outliers efficiently and use of random forest classifier as an ensemble method of decision trees, complementing both IF and OCSVM are associated to increase accuracy of a system. The result of this paper shows that this hybrid approach provides more accuracy than individual algorithm. Keywords: Anomaly detection, Isolation Forest, One Class SVM, Random forest classifier, Hybrid approach
- Research Article
- 10.1016/j.aeue.2023.154709
- May 15, 2023
- AEU - International Journal of Electronics and Communications
Identity-based attack detection using received signal strength in MIMO systems
- Research Article
19
- 10.3390/computers11040054
- Apr 8, 2022
- Computers
Within the context of Industry 4.0, quality assessment procedures using data-driven techniques are becoming more critical due to the generation of massive amounts of production data. In this paper, we address the detection of abnormal screw tightening processes, which is a key industrial task. Since labeling is costly, requiring a manual effort, we focus on unsupervised detection approaches. In particular, we assume a computationally light low-dimensional problem formulation based on angle–torque pairs. Our work is focused on two unsupervised machine learning (ML) algorithms: isolation forest (IForest) and a deep learning autoencoder (AE). Several computational experiments were held by assuming distinct datasets and a realistic rolling window evaluation procedure. First, we compared the two ML algorithms with two other methods, a local outlier factor method and a supervised Random Forest, on older data related with two production days collected in November 2020. Since competitive results were obtained, during a second stage, we further compared the AE and IForest methods by adopting a more recent and larger dataset (from February to March 2021, totaling 26.9 million observations and related to three distinct assembled products). Both anomaly detection methods obtained an excellent quality class discrimination (higher than 90%) under a realistic rolling window with several training and testing updates. Turning to the computational effort, the AE is much lighter than the IForest for training (around 2.7 times faster) and inference (requiring 3.0 times less computation). This AE property is valuable within this industrial domain since it tends to generate big data. Finally, using the anomaly detection estimates, we developed an interactive visualization tool that provides explainable artificial intelligence (XAI) knowledge for the human operators, helping them to better identify the angle–torque regions associated with screw tightening failures.
- Conference Article
3
- 10.1145/3440943.3444743
- Dec 12, 2020
This study proposed an anomaly detection technique in an industrial control system using supervised and unsupervised machine learning algorithms. For the dataset for learning, the HIL-based Augmented ICS (HAI) dataset provided for the study on security in industrial control systems was used. For the learning model, Light Gradient Boosted Machine -- a supervised learning algorithm and One-Class Support Vector Machine and Isolation Forest as unsupervised learning algorithms were employed. The proposed technique is presented in this paper, which is organized as follows: Feature selection, Data preprocessing, Hyperparameter optimization and verification, and Experiment and analysis of results. The performance difference according to the algorithm and model configuration was exhibited through the experimental results. In addition, issues to be considered in model configuration and future study directions for anomaly detection techniques in industrial control systems were presented based on the experimental results.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.