Abstract

Federated learning (FL) enables various organizations to jointly train one single model without revealing their private data to each other. The FL can be classified as horizontal federated learning (HFL) and vertical federated learning (VFL) according to the distribution of overlap samples and overlap features in the dataset. VFL allows various organizations to share machine learning based on the overlap samples, each one of which has the same identity. However, VFL suffers from insufficient number of overlap samples among all participants. Hence, the shortage of overlap data results in a worse performance of the global model. In this article, we propose a data augmentation method, FedDA, which is based on the generative adversarial network (GAN) to increase the number of training data. We generate more overlap data by learning the features of finite overlap data and many locally existing nonoverlap data, which expand the availability for training the overlap dataset. A series of experiments were executed on both MNIST and CIFAR-10. The results show that FedDA can efficiently utilize nonoverlap samples to enhance the effect of the data augmentation. It can generate high-quality overlap samples and expand the set of overlap samples. Thus, when the VFL is short of overlap samples, FedDA can provide abundant training data to improve the performance of the VFL model.

Highlights

  • Machine learning is used to explore the hidden information from a large volume of existing data, and obviously, it is tedious that those data are from a single participant

  • To verify the performance of the method we proposed, we designed a series of experiments based on the dataset of MNIST and CIFAR-10

  • Each participant has a large amount of nonoverlap data, but it is not utilized for vertical federated learning

Read more

Summary

Introduction

Machine learning is used to explore the hidden information from a large volume of existing data, and obviously, it is tedious that those data are from a single participant. Horizontal federated learning is usually applied to the scenarios where the datasets of participants have nearly the same feature space but different sample identity spaces. On the other hand, is used in the scenarios where the datasets of participants have nearly the same sample identity space but different feature spaces. Lots of research focus on how to establish a vertical federated learning model. They usually can only use the overlap data between participants. When the amount of overlap data between multiple organizations is scarce, performing vertical federated learning will undoubtedly produce a terrible model effect. (1) We design a novel federated data augmentation method for vertical federated learning, namely, FedDA, to expand the number of available samples (2) We proposed to use adversarial generative networks for data augmentation in vertical federated learning (3) We conducted a range of experiments on FedDA to prove its effectiveness and studied the quality of generated data by FedDA under different data distributions on two different typical datasets of MNIST and CIFAR-10

Related Work
The Proposed Approach
Experiment Evaluation
Split 2 Input 3 Output 4 Calculation
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.