Class Distribution Imbalance Research Articles

Federated learning (FL) provides a decentralized machine learning paradigm where a server collaborates with a group of clients to learn a global model without accessing the clients' data. User heterogeneity is a significant challenge for FL, which together with the class-distribution imbalance further enhances the difficulty of FL. Great progress has been made in large vision-language models, such as Contrastive Language-Image Pre-training (CLIP), which paves a new way for image classification and object recognition. Inspired by the success of CLIP on few-shot and zero-shot learning, we use CLIP to optimize the federated learning between server and client models under its vision-language supervision. It is promising to mitigate the user heterogeneity and class-distribution balance due to the powerful cross-modality representation and rich open-vocabulary prior knowledge. In this paper, we propose the CLIP-guided FL (CLIP2FL) method on heterogeneous and long-tailed data. In CLIP2FL, the knowledge of the off-the-shelf CLIP model is transferred to the client-server models, and a bridge is built between the client and server. Specifically, for client-side learning, knowledge distillation is conducted between client models and CLIP to improve the ability of client-side feature representation. For server-side learning, in order to mitigate the heterogeneity and class-distribution imbalance, we generate federated features to retrain the server model. A prototype contrastive learning with the supervision of the text encoder of CLIP is introduced to generate federated features depending on the client-side gradients, and they are used to retrain a balanced server classifier. Extensive experimental results on several benchmarks demonstrate that CLIP2FL achieves impressive performance and effectively deals with data heterogeneity and long-tail distribution. The code is available at https://github.com/shijiangming1/CLIP2FL.

Read full abstract

Credit card fraud is a significant problem that costs billions of dollars annually. Detecting fraudulent transactions is challenging due to the imbalance in class distribution, where the majority of transactions are legitimate. While pre-processing techniques such as oversampling of minority classes are commonly used to address this issue, they often generate unrealistic or overgeneralized samples. This paper proposes a method called autoencoder with probabilistic xgboost based on SMOTE and CGAN(AE-XGB-SMOTE-CGAN) for detecting credit card frauds.AE-XGB-SMOTE-CGAN is a novel method proposed for credit card fraud detection problems. The credit card fraud dataset comes from a real dataset anonymized by a bank and is highly imbalanced, with normal data far greater than fraud data. Autoencoder (AE) is used to extract relevant features from the dataset, enhancing the ability of feature representation learning, and are then fed into xgboost for classification according to the threshold. Additionally, in this study, we propose a novel approach that hybridizes Generative Adversarial Network (GAN) and Synthetic Minority Over-Sampling Technique (SMOTE) to tackle class imbalance problems. Our two-phase oversampling approach involves knowledge transfer and leverages the synergies of SMOTE and GAN. Specifically, GAN transforms the unrealistic or overgeneralized samples generated by SMOTE into realistic data distributions where there is not enough minority class data available for GAN to process effectively on its own. SMOTE is used to address class imbalance issues and CGAN is used to generate new, realistic data to supplement the original dataset. The AE-XGB-SMOTE-CGAN algorithm is also compared to other commonly used machine learning algorithms, such as KNN and Light GBM, and shows an overall improvement of 2% in terms of the ACC index compared to these algorithms. The AE-XGB-SMOTE-CGAN algorithm also outperforms KNN in terms of the MCC index by 30% when the threshold is set to 0.35. This indicates that the AE-XGB-SMOTE-CGAN algorithm has higher accuracy, true positive rate, true negative rate, and Matthew's correlation coefficient, making it a promising method for detecting credit card fraud.

Read full abstract

Class Distribution Imbalance Research Articles

Related Topics

Articles published on Class Distribution Imbalance

LCANet: a model for analysis of students real-time sentiment by integrating attention mechanism and joint loss function

A Novel Ensemble Deep Learning Based Polyp Detection Using Colonoscopy Dataset

Class-overlap detection based on heterogeneous clustering ensemble for multi-class imbalance problem

CLIP-Guided Federated Learning on Heterogeneity and Long-Tailed Data

A novel method for detecting credit card fraud problems.

Credit Card Fraud Detection and Identification using Machine Learning Techniques

Imbalanced class distribution and performance evaluation metrics: A systematic review of prediction accuracy for determining model performance in healthcare systems.

Fraud Detection and Identification in Credit Card Based on Machine Learning Techniques

Class-overlap undersampling based on Schur decomposition for Class-imbalance problems

A Learning Framework for Medical Image-Based Intelligent Diagnosis from Imbalanced Datasets.

Dynamically Weighted Balanced Loss: Class Imbalanced Learning and Confidence Calibration of Deep Neural Networks.

Classification of the urinary metabolome using machine learning and potential applications to diagnosing interstitial cystitis.

Classification of Imbalanced Datasets using One-Class SVM, k-Nearest Neighbors and CART Algorithm

Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies

A Preliminary Study on Learning Challenges in Machine Learning-based Flight Delay Prediction

SDD-CNN: Small Data-Driven Convolution Neural Networks for Subtle Roller Defect Inspection

Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data

RHSBoost: Improving classification performance in imbalance data

Distinctive characteristics of a metric using deviations from Poisson for feature selection

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Class Distribution Imbalance Research Articles

Related Topics

Articles published on Class Distribution Imbalance

LCANet: a model for analysis of students real-time sentiment by integrating attention mechanism and joint loss function

A Novel Ensemble Deep Learning Based Polyp Detection Using Colonoscopy Dataset

Class-overlap detection based on heterogeneous clustering ensemble for multi-class imbalance problem

CLIP-Guided Federated Learning on Heterogeneity and Long-Tailed Data

A novel method for detecting credit card fraud problems.

Credit Card Fraud Detection and Identification using Machine Learning Techniques

Imbalanced class distribution and performance evaluation metrics: A systematic review of prediction accuracy for determining model performance in healthcare systems.

Fraud Detection and Identification in Credit Card Based on Machine Learning Techniques

Class-overlap undersampling based on Schur decomposition for Class-imbalance problems

A Learning Framework for Medical Image-Based Intelligent Diagnosis from Imbalanced Datasets.

Dynamically Weighted Balanced Loss: Class Imbalanced Learning and Confidence Calibration of Deep Neural Networks.

Classification of the urinary metabolome using machine learning and potential applications to diagnosing interstitial cystitis.

Classification of Imbalanced Datasets using One-Class SVM, k-Nearest Neighbors and CART Algorithm

Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies

A Preliminary Study on Learning Challenges in Machine Learning-based Flight Delay Prediction

SDD-CNN: Small Data-Driven Convolution Neural Networks for Subtle Roller Defect Inspection

Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data

RHSBoost: Improving classification performance in imbalance data

Distinctive characteristics of a metric using deviations from Poisson for feature selection