Abstract

With the growing popularity of cryptocurrencies and their decentralized nature, the risk of fraudulent activities within these ecosystems has become a pressing concern. This research paper focuses on Ethereum fraud detection using a dataset specifically curated for this purpose. The methodology encompasses essential steps, including data cleaning, correlation analysis, data splitting, and exploratory data analysis to understand the data characteristics. Subsequently, self-optimized machine learning models are trained with the Pycaret library while addressing the class imbalance using SMOTENC (Synthetic Minority oversampling Technique for Nominal and Continuous Data), ADA-SYN (Adaptive Synthetic Algorithm), and K-Means-SMOTE techniques. The performance of the various models is evaluated on test and validation datasets using metrics such as accuracy, precision, recall, and AUC (Area Under Curve). The study reveals that the ensemble models, particularly CATBoost (Categorical Boost) and LGBM (Light Gradient Boost Method), show exceptional efficiency, with accuracy ranging from 97% to 98.42% after oversampling. Moreover, these models exhibit higher F1 scores and AUC values, indicating their potential to detect fraud effectively. The validation metrics also lie in the same range, demonstrating that the models do not suffer from over-fitting. The experiment demonstrates the promise of ensemble models in Ethereum fraud detection, paving the way for deploying robust fraud detection systems in crypto-currency ecosystems. The results show that the K-Means SMOTE oversampling technique has the highest classification accuracy levels of 98.42% with an AUC of 99.82%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call