Abstract

Background Bipolar Disorder (BD) is a heritable mood disorder with about 1% lifetime prevalence in general population. Many BD-associated loci have been identified through Genome-Wide Association Studies (GWAS). However, larger sample size and more detailed clinical/phenotypic information are needed to further understand the etiology of BD. To expand to research resources for BD genetics, we seek to leverage the Electronic Health Records (EHR) database, combined with genome-wide genetic data. As part of the International Cohort Collection for Bipolar Disorder (ICCBD), we developed automated phenotyping algorithms that can identify BD patients using codified data and concepts extracted by Natural Language Processing (NLP) from clinical narratives. Using these algorithms, we identified BD cases and healthy controls in the Partners HealthCare the Research Patient Data Registry (RPDR) and genome-wide genotyped these BD cases and controls. We demonstrated that these algorithms have high positive predictive value using a gold standard of in-person structured interviews (Castro et al. Am J Psychiatry, 2015). Methods Here, we attempted to genetically validate these algorithm-identified BD cases and controls by 1) quantifying the contribution of genetic components to algorithm-identified BD and 2) estimating the genome-wide genetic similarity between algorithm-identified BD and BD ascertained by psychiatrist diagnosis or systematic interview. We evaluated BD cases identified with 4 different algorithms: an algorithm using narrative note with NLP and LASSO procedure (NLP) and 3 rule-based algorithms using codified data of patient's diagnostic and treatment history with decreasing levels of stringency – “coded-strict”, “coded-broad”, coded-broad based on a single clinical encounter (coded-SV). We used LD score regression to estimate SNP-based heritability (h2g) and genetic correlation (rg) between 4 EHR-based BD and between BD ascertained conventionally by diagnosis or interview in the Psychiatric Genomics Consortium (PGC) and ICCBD GWAS. Results We identified and genotyped 862 NLP, 1968 coded-strict, 2581 coded-broad and 408 coded-SV BD cases, and a set of 3952 controls of European ancestry. The estimated h2g were 0.24 (p=0.015), 0.09 (p=0.064), 0.13 (p=0.003), 0 (p=0.591) for NLP, coded-strict, coded-broad and coded-SV BD, respectively. These h2g were lower than those observed by the PGC+ICCBD (0.23, p=3.17E-80, total N=33181). However, the rg between conventionally ascertained BD and the EHR-based cases were high for NLP (0.66, p=3.69E-5), coded-strict (1.00, p=2.40E-4), and coded-broad (0.74, p=8.11E-7). The h2g for EHR-based cases combined was 0.11 (p=0.006) and the rg with PGC BD was 0.83 (p=2.88E-6). The rg between EHR-based BDs ranged from 0.90 to 0.98. Discussion These results provide the first genetic validation of automated EHR-based phenotyping for BD and suggest that this approach identifies cases that are highly genetically correlated with those ascertained through conventional methods. High throughput phenotyping using the large data resources available in EHRs represents a viable method for accelerating psychiatric genetic research.

Highlights

  • Twin studies first documented the high heritability of bipolar disorder (BD) decades ago, only recently have robustly associated genetic risk loci been identified through genomewide association studiesChen et al Translational Psychiatry (2018)8:86(GWAS)[1,2,3,4,5,6,7,8]

  • The rg between electronic health records (EHRs)-based BD definitions ranged from 0.90 to 0.98. These results provide the first genetic validation of automated EHR-based phenotyping for BD and suggest that this approach identifies cases that are highly genetically correlated with those ascertained through conventional methods

  • Introduction twin studies first documented the high heritability of bipolar disorder (BD) decades ago, only recently have robustly associated genetic risk loci been identified through genomewide association studies

Read more

Summary

Introduction

Twin studies first documented the high heritability of bipolar disorder (BD) decades ago, only recently have robustly associated genetic risk loci been identified through genomewide association studiesChen et al Translational Psychiatry (2018)8:86(GWAS)[1,2,3,4,5,6,7,8]. The widespread adoption of longitudinal electronic health records (EHRs) has provided a vast and growing repository of phenotypic data that can be leveraged for psychiatric research[9]. When linked to sample collections through biobanks and other efforts, EHR data provide a relatively untapped opportunity to enhance the power of genetic research. In an effort to rapidly increase available samples for genomewide studies of BD, we established the International Cohort Collection for Bipolar Disorder (ICCBD) through which we applied high-throughput phenotyping methods at sites in the United States (US), United Kingdom (UK), and Sweden[7]. At the US site (Partners Healthcare), we developed and applied EHR phenotyping algorithms to identify approximately 4500 cases and 5000 controls for whom DNA was obtained from discarded blood samples. We have undertaken systematic efforts to evaluate the validity of our EHR-based phenotyping algorithms

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call