Abstract

Augmenting a dataset with synthetic samples is a common processing step in machine learning with imbalanced classes to improve model performance. Another potential benefit of synthetic data is the ability to share information between cooperating parties while maintaining customer privacy. Often overlooked, however, is how the distribution of the data affects the potential gains from synthetic data augmentation. We present a case study in credit card fraud detection using Generative Adversarial Networks to generate synthetic samples, with explicit consideration given to customer distributions. We investigate two different cooperating party scenarios yielding four distinct customer distributions by credit quality. Our findings indicate that institutions skewed towards higher credit quality customers are more likely to benefit from augmentation with GANs. Relative gains from synthetic data transfer, in the absence of feature set heterogeneity, also appear to asymmetrically favour banks operating on the lower end of the credit spectrum, which we hypothesise is due to differences in spending behaviours.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.