Abstract

BackgroundData sharing has been a big challenge in biomedical informatics because of privacy concerns. Contextual embedding models have demonstrated a very strong representative capability to describe medical concepts (and their context), and they have shown promise as an alternative way to support deep-learning applications without the need to disclose original data. However, contextual embedding models acquired from individual hospitals cannot be directly combined because their embedding spaces are different, and naive pooling renders combined embeddings useless.ObjectiveThe aim of this study was to present a novel approach to address these issues and to promote sharing representation without sharing data. Without sacrificing privacy, we also aimed to build a global model from representations learned from local private data and synchronize information from multiple sources.MethodsWe propose a methodology that harmonizes different local contextual embeddings into a global model. We used Word2Vec to generate contextual embeddings from each source and Procrustes to fuse different vector models into one common space by using a list of corresponding pairs as anchor points. We performed prediction analysis with harmonized embeddings.ResultsWe used sequential medical events extracted from the Medical Information Mart for Intensive Care III database to evaluate the proposed methodology in predicting the next likely diagnosis of a new patient using either structured data or unstructured data. Under different experimental scenarios, we confirmed that the global model built from harmonized local models achieves a more accurate prediction than local models and global models built from naive pooling.ConclusionsSuch aggregation of local models using our unique harmonization can serve as the proxy for a global model, combining information from a wide range of institutions and information sources. It allows information unique to a certain hospital to become available to other sites, increasing the fluidity of information flow in health care.

Highlights

  • MotivationAs large datasets from different areas ranging from genetics, microbiomes, nutrients, medicine, medical devices to the environment are being collected from large populations, it is believed that more efforts should be spent on reshaping the wealth of data and utilizing them to promote precision medicine [1]

  • Using events related to these common diagnoses, we can derive a reasonable transformation matrix to apply to the rest of the data even if we extend the method beyond the MIMIC-III database

  • Contextual embedding models are extremely useful in health care modeling because of their representativeness and applicability to downstream machine-learning models

Read more

Summary

Introduction

MotivationAs large datasets from different areas ranging from genetics, microbiomes, nutrients, medicine, medical devices to the environment are being collected from large populations, it is believed that more efforts should be spent on reshaping the wealth of data and utilizing them to promote precision medicine [1]. All of them require a large sample size to avoid false positives and insignificant results [3,4] To gather such large samples, there have been some efforts to share deidentified data such as clinical notes in compliance with the Health Insurance Portability and Accountability Act (HIPAA) [5]. There is an urgent need for developing a new method to share information learned from local sources to generalize and scale up research effort. We aimed to build a global model from representations learned from local private data and synchronize information from multiple sources. Methods: We propose a methodology that harmonizes different local contextual embeddings into a global model. It allows information unique to a certain hospital to become available to other sites, increasing the fluidity of information flow in health care

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.