Leveraging synthetic data for AI bias mitigation

Ajay M Patrikar,Arjuna Mahenthiran,Ahmad Said,Kimberly E Manser,Raghuveer M Rao,Christopher L Howell

doi:10.1117/12.2662276

Abstract

Widespread adoption of artificial intelligence (AI) in civilian and defense government agencies requires the stakeholders to have trust in AI solutions. One of the five principles of ethical AI, identified by the Department of Defense, emphasizes that AI solutions be equitable. The AI system involves a series of choices from data selection to model definition, each of which is subject to human and algorithmic biases and can lead to unintended consequences. This paper focuses on allowing AI bias mitigation with the use of synthetic data. The proposed technique, named Fair-GAN, builds upon the recently developed Fair-SMOTE approach, which used synthesized data to fix class and other imbalances caused by protected attributes such as race and gender. Fair-GAN uses Generative Adversarial Networks (GAN) instead of the Synthetic Minority Oversampling Technique (SMOTE). While SMOTE can only synthesize tabular and numerical data, GAN can synthesize tabular data with numerical, binary, and categorical variables. GAN can also synthesize other data forms such as images, audio and text. In our experiments, we use the Synthetic Data Vault (SDV), which implements approaches such as conditional tabular GAN (CTGAN) and tabular variational autoencoders (TVAE). We show the applicability of Fair-GAN to several benchmark problems, which are used to evaluate the efficacy of AI bias mitigation algorithms. It is shown that Fair-GAN leads to significant improvements in metrics used for evaluating AI fairness such as the statistical parity difference, disparate impact, average odds difference, and equal opportunities difference.

Full Text