Implementing Vertical Federated Learning Using Autoencoders: Practical Application, Generalizability, and Utility Study.

Dongchul Cha,Yu-Rang Park,Mindong Sung

doi:10.2196/26598

Abstract

BackgroundMachine learning (ML) is now widely deployed in our everyday lives. Building robust ML models requires a massive amount of data for training. Traditional ML algorithms require training data centralization, which raises privacy and data governance issues. Federated learning (FL) is an approach to overcome this issue. We focused on applying FL on vertically partitioned data, in which an individual’s record is scattered among different sites.ObjectiveThe aim of this study was to perform FL on vertically partitioned data to achieve performance comparable to that of centralized models without exposing the raw data.MethodsWe used three different datasets (Adult income, Schwannoma, and eICU datasets) and vertically divided each dataset into different pieces. Following the vertical division of data, overcomplete autoencoder-based model training was performed for each site. Following training, each site’s data were transformed into latent data, which were aggregated for training. A tabular neural network model with categorical embedding was used for training. A centrally based model was used as a baseline model, which was compared to that of FL in terms of accuracy and area under the receiver operating characteristic curve (AUROC).ResultsThe autoencoder-based network successfully transformed the original data into latent representations with no domain knowledge applied. These altered data were different from the original data in terms of the feature space and data distributions, indicating appropriate data security. The loss of performance was minimal when using an overcomplete autoencoder; accuracy loss was 1.2%, 8.89%, and 1.23%, and AUROC loss was 1.1%, 0%, and 1.12% in the Adult income, Schwannoma, and eICU dataset, respectively.ConclusionsWe proposed an autoencoder-based ML model for vertically incomplete data. Since our model is based on unsupervised learning, no domain-specific knowledge is required in individual sites. Under the circumstances where direct data sharing is not available, our approach may be a practical solution enabling both data protection and building a robust model.

Highlights

Machine learning (ML) is widely deployed in our daily lives, including, but not limited to, personalized digital media, product recommendations, and health care services
A centrally based model was used as a baseline model, which was compared to that of Federated learning (FL) in terms of accuracy and area under the receiver operating characteristic curve (AUROC)
We proposed an autoencoder-based ML model for vertically incomplete data

Summary

Introduction

Machine learning (ML) is widely deployed in our daily lives, including, but not limited to, personalized digital media, product recommendations, and health care services. Building high-quality ML models requires a huge amount of data for training [1]. Conventional ML algorithms typically require the training data to reside where the models are trained. The EU General Data Protection Regulation and the US https://medinform.jmir.org/2021/6/e26598 XSLFO RenderX. As more data are needed for a robust ML model, raw data are a crucial asset. Sharing raw data raises data governance issues, making data owners hesitant about sharing their data. Machine learning (ML) is widely deployed in our everyday lives. Building robust ML models requires a massive amount of data for training. Traditional ML algorithms require training data centralization, which raises privacy and data governance issues. We focused on applying FL on vertically partitioned data, in which an individual’s record is scattered among different sites

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR Medical Informatics	Publication Date: Jun 9, 2021
Citations: 19	License type: cc-by

R Discovery Prime

R Discovery Prime

Implementing Vertical Federated Learning Using Autoencoders: Practical Application, Generalizability, and Utility Study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR Medical Informatics

Lead the way for us

Similar Papers

185. A predictive model for C5 palsy after instrumented cervical fusion
Akash A Shah ... Don Y Park
The Spine Journal | VOL. 22
Akash A Shah, et. al.Akash A Shah ... Don Y Park
19 Aug 2022
185. A predictive model for C5 palsy after instrumented cervical fusion
Akash A Shah ... Don Y Park

Comparison of machine learning and logistic regression models in predicting acute kidney injury: A systematic review and meta-analysis
Xuan Song ... Chunting Wang
International Journal of Medical Informatics | VOL. 151
Xuan Song, et. al.Xuan Song ... Chunting Wang
08 May 2021
International Journal of Medical Informatics | VOL. 151

Air Quality Analysis through IoT Device and Risk Prediction of Asthma Attack using ML Techniques
Avishek Banerjee ... Akash Yadav
Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) | VOL. 17
Avishek Banerjee, et. al.Avishek Banerjee ... Akash Yadav
16 Aug 2024
Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) | VOL. 17

A Review of Computational Intelligence Models for Brain Tumour Classification and Prediction
Justice Kwame Appati ... Godfred Akwetey Brown
International Journal of Software Science and Computational Intelligence | VOL. 13
Justice Kwame Appati, et. al.Justice Kwame Appati ... Godfred Akwetey Brown
01 Oct 2021
International Journal of Software Science and Computational Intelligence | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Implementing Vertical Federated Learning Using Autoencoders: Practical Application, Generalizability, and Utility Study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR Medical Informatics