Abstract

Increasingly, clinical phenotypes with matched genetic data from bio-bank linked electronic health records (EHRs) have been used for pleiotropy analyses. Thus far, pleiotropy analysis using individual-level EHR data has been limited to data from one site. However, it is desirable to integrate EHR data from multiple sites to improve the detection power and generalizability of the results. Due to privacy concerns, individual-level patients’ data are not easily shared across institutions. As a result, we introduce Sum-Share, a method designed to efficiently integrate EHR and genetic data from multiple sites to perform pleiotropy analysis. Sum-Share requires only summary-level data and one round of communication from each site, yet it produces identical test statistics compared with that of pooled individual-level data. Consequently, Sum-Share can achieve lossless integration of multiple datasets. Using real EHR data from eMERGE, Sum-Share is able to identify 1734 potential pleiotropic SNPs for five cardiovascular diseases.

Highlights

  • Clinical phenotypes with matched genetic data from bio-bank linked electronic health records (EHRs) have been used for pleiotropy analyses

  • The gold standard approach would be to pool individual-level patient data from multiple EHRs and perform pleiotropy tests on the combined data, known as individual patient-level data mega-analysis. This is rarely feasible in the real-world setting, as patient data are protected for privacy concerns and not shareable across EHRs

  • Sum-Share, which is based on the composite likelihood approach, decomposes the desired overall test statistics of the pleiotropic test into EHR specific test statistics

Read more

Summary

Introduction

Clinical phenotypes with matched genetic data from bio-bank linked electronic health records (EHRs) have been used for pleiotropy analyses. We introduce Sum-Share, a method designed to efficiently integrate EHR and genetic data from multiple sites to perform pleiotropy analysis. Due to identifiability and privacy concerns, patients’ genetic and clinical information is often heavily protected and rarely shared across different EHRs. A potential solution is to utilize summary statistics to transfer information across datasets. We developed Sum-Share (SUMmary Statistics from multiple electronic HeAlth Records for plEiotropy) to detect pleiotropy. This method allows for flexible covariate adjustment for each phenotype, is computationally more efficient than traditional methods, and leads to mathematically identical results as compared to analyses of pooled patient-level data from different sites. Sum-Share only relies on summary statistics from different sites

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call