Computationally efficient methods for estimating phenome—wide coheritability of multi-type phenotypes using biobank data

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Biobank data provide a rich source for studying the coheritability of multiple disease phenotypes, which can provide information on shared genetic etiology. However, the large number and heterogeneous types of phenotypes (e.g., continuous, discrete, time-to-event) pose significant statistical and computational challenges for estimating coheritability. In this work, we propose a unified modeling framework with latent random effects distinguishing genetic and family-shared environmental contributions to variation across multi-type phenotypes. To avoid high-dimensional integrals over many phenotypes and family members in joint likelihood approaches, we develop a computationally efficient procedure by first maximizing the marginal likelihood function for each individual phenotype and then estimating the coheritability using only pairs of phenotypes. We apply our method to analyze the heritability and coheritability of 290 phenotypes obtained from the UK Biobank. We find that a substantial number of phenotype pairs present statistically significant genetic coheritability.

Similar Papers
  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.jid.2023.02.010
Genetic Validation of Psoriasis Phenotyping in UK Biobank Supports the Utility of Self-Reported Data and Composite Definitions for Large Genetic and Epidemiological Studies
  • Mar 3, 2023
  • Journal of Investigative Dermatology
  • Jake R Saklatvala + 8 more

If citing, it is advised that you check and use the publisher's definitive version for pagination, volume/issue, and date of publication details. And where the final published version is provided on the Research Portal, if citing you are again advised to check the publisher's website for any subsequent corrections.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 22
  • 10.1016/j.jaci.2020.04.050
Assessment of a causal relationship between body mass index and atopic dermatitis
  • May 17, 2020
  • Journal of Allergy and Clinical Immunology
  • Ashley Budu-Aggrey + 14 more

Assessment of a causal relationship between body mass index and atopic dermatitis

  • Research Article
  • Cite Count Icon 16
  • 10.1017/s0033291721000945
Associations and limited shared genetic aetiology between bipolar disorder and cardiometabolic traits in the UK Biobank
  • Mar 26, 2021
  • Psychological Medicine
  • Anna E Fürtjes + 4 more

People with bipolar disorder (BPD) are more likely to die prematurely, which is partly attributed to comorbid cardiometabolic traits. Previous studies report cardiometabolic abnormalities in BPD, but their shared aetiology remains poorly understood. This study examined the phenotypic associations and shared genetic aetiology between BPD and various cardiometabolic traits. In a subset of the UK Biobank sample (N = 61 508) we investigated phenotypic associations between BPD (ncases = 4186) and cardiometabolic traits, represented by biomarkers, anthropometric traits and cardiometabolic diseases. To determine shared genetic aetiology in European ancestry, polygenic risk scores (PRS) and genetic correlations were calculated between BPD and cardiometabolic traits. Several traits were significantly associated with increased risk for BPD, namely low total cholesterol, low high-density lipoprotein cholesterol, high triglycerides, high glycated haemoglobin, low systolic blood pressure, high body mass index, high waist-to-hip ratio; and stroke, coronary artery disease and type 2 diabetes diagnosis. BPD was associated with higher polygenic risk for triglycerides, waist-to-hip ratio, coronary artery disease and type 2 diabetes. Shared genetic aetiology persisted for coronary artery disease, when correcting PRS associations for cardiometabolic base phenotypes. Associations were not replicated using genetic correlations. This large study identified increased phenotypic cardiometabolic abnormalities in BPD participants. It is found that the comorbidity of coronary artery disease may be based on shared genetic aetiology. These results motivate hypothesis-driven research to consider individual cardiometabolic traits rather than a composite metabolic syndrome when attempting to disentangle driving mechanisms of cardiometabolic abnormalities in BPD.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 24
  • 10.1038/mp.2017.189
Genetic contributions to Trail Making Test performance in UK Biobank
  • Sep 19, 2017
  • Molecular Psychiatry
  • S P Hagenaars + 8 more

The Trail Making Test (TMT) is a widely used test of executive function and has been thought to be strongly associated with general cognitive function. We examined the genetic architecture of the TMT and its shared genetic aetiology with other tests of cognitive function in 23 821 participants from UK Biobank. The single-nucleotide polymorphism-based heritability estimates for trail-making measures were 7.9% (part A), 22.4% (part B) and 17.6% (part B−part A). Significant genetic correlations were identified between trail-making measures and verbal-numerical reasoning (rg>0.6), general cognitive function (rg>0.6), processing speed (rg>0.7) and memory (rg>0.3). Polygenic profile analysis indicated considerable shared genetic aetiology between trail making, general cognitive function, processing speed and memory (standardized β between 0.03 and 0.08). These results suggest that trail making is both phenotypically and genetically strongly associated with general cognitive function and processing speed.

  • Abstract
  • 10.1016/j.euroneuro.2017.08.381
M74 - GENETIC CONTRIBUTIONS TO TRAIL MAKING TEST PERFORMANCE IN UK BIOBANK
  • Jan 1, 2019
  • European Neuropsychopharmacology
  • Saskia Hagenaars + 9 more

Background The Trail Making Test (TMT) is a widely used neuropsychological test of executive function. TMT performance has been ascribed to a number of cognitive processes. Family and twin based studies have provided evidence for a genetic contribution to TMT performance. This study aims to identify genetic variants underlying performance of the TMT and the genetic overlap between TMT and other cognitive abilities. Methods We examined the genetic architecture of TMT in 23,821 individuals from UK Biobank using GWAS, GCTA-GREML, and gene-based analysis. We tested for a shared genetic aetiology with other cognitive abilities using genetic correlations and polygenic profile scores, in both UK Biobank, Generation Scotland, and the Lothian Birth Cohort of 1936. Summary statistics based on the GWAS of TMT in the CHARGE consortium were used to created polygenic profile scores to predict TMT performance in UK Biobank. Results The SNP-based heritability estimates for trail-making measures were 7.9% (part A), 22.4% (part B), and 17.6% (part B – part A). Significant genetic correlations were identified between trail-making measures and verbal-numerical reasoning (rg > 0.6), general cognitive function (rg > 0.6), processing speed (rg > 0.7), and memory (rg > 0.3). Polygenic profile analysis indicated considerable shared genetic aetiology between trail making, general cognitive function, processing speed, and memory (standardized β between 0.03 and 0.08). Discussion These results, spanning methodologies and cohorts, provide strong evidence for a shared genetic architecture of TMT performance, general cognitive function and processing speed. These findings highlight the shared genetic architecture of cognitive abilities, and signpost new opportunities for clarifying the association between cognitive ability and health, as indicated by previous research.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.1038/s44161-024-00475-3
Genetically proxied HTRA1 protease activity and circulating levels independently predict risk of ischemic stroke and coronary artery disease.
  • May 20, 2024
  • Nature cardiovascular research
  • Rainer Malik + 12 more

Genetic variants in HTRA1 are associated with stroke risk. However, the mechanisms mediating this remain largely unknown, as does the full spectrum of phenotypes associated with genetic variation in HTRA1. Here we show that rare HTRA1 variants are linked to ischemic stroke in the UK Biobank and BioBank Japan. Integrating data from biochemical experiments, we next show that variants causing loss of protease function associated with ischemic stroke, coronary artery disease and skeletal traits in the UK Biobank and MyCode cohorts. Moreover, a common variant modulating circulating HTRA1 mRNA and protein levels enhances the risk of ischemic stroke and coronary artery disease while lowering the risk of migraine and macular dystrophy in genome-wide association study, UK Biobank, MyCode and BioBank Japan data. We found no interaction between proxied HTRA1 activity and levels. Our findings demonstrate the role of HTRA1 for cardiovascular diseases and identify two mechanisms as potential targets for therapeutic interventions.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 349
  • 10.1038/mp.2015.225
Shared genetic aetiology between cognitive functions and physical and mental health in UK Biobank (N=112 151) and 24 GWAS consortia.
  • Jan 26, 2016
  • Molecular Psychiatry
  • S P Hagenaars + 18 more

Causes of the well-documented association between low levels of cognitive functioning and many adverse neuropsychiatric outcomes, poorer physical health and earlier death remain unknown. We used linkage disequilibrium regression and polygenic profile scoring to test for shared genetic aetiology between cognitive functions and neuropsychiatric disorders and physical health. Using information provided by many published genome-wide association study consortia, we created polygenic profile scores for 24 vascular–metabolic, neuropsychiatric, physiological–anthropometric and cognitive traits in the participants of UK Biobank, a very large population-based sample (N=112 151). Pleiotropy between cognitive and health traits was quantified by deriving genetic correlations using summary genome-wide association study statistics and to the method of linkage disequilibrium score regression. Substantial and significant genetic correlations were observed between cognitive test scores in the UK Biobank sample and many of the mental and physical health-related traits and disorders assessed here. In addition, highly significant associations were observed between the cognitive test scores in the UK Biobank sample and many polygenic profile scores, including coronary artery disease, stroke, Alzheimer's disease, schizophrenia, autism, major depressive disorder, body mass index, intracranial volume, infant head circumference and childhood cognitive ability. Where disease diagnosis was available for UK Biobank participants, we were able to show that these results were not confounded by those who had the relevant disease. These findings indicate that a substantial level of pleiotropy exists between cognitive abilities and many human mental and physical health disorders and traits and that it can be used to predict phenotypic variance across samples.

  • Research Article
  • Cite Count Icon 49
  • 10.1183/13993003.00199-2021
A large-scale genome-wide association analysis of lung function in the Chinese population identifies novel loci and highlights shared genetic aetiology with obesity.
  • Mar 25, 2021
  • European Respiratory Journal
  • Zhaozhong Zhu + 23 more

BackgroundLung function is a heritable complex phenotype with obesity being one of its important risk factors. However, knowledge of their shared genetic basis is limited. Most genome-wide association studies (GWASs) for lung function have been based on European populations, limiting the generalisability across populations. Large-scale lung function GWASs in other populations are lacking.MethodsWe included 100 285 subjects from the China Kadoorie Biobank (CKB). To identify novel loci for lung function, single-trait GWAS analyses were performed on forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC in the CKB. We then performed genome-wide cross-trait analysis between lung function and obesity traits (body mass index (BMI), BMI-adjusted waist-to-hip ratio and BMI-adjusted waist circumference) to investigate the shared genetic effects in the CKB. Finally, polygenic risk scores (PRSs) of lung function were developed in the CKB and their interaction with BMI's association on lung function were examined. We also conducted cross-trait analysis in parallel with the CKB using up to 457 756 subjects from the UK Biobank (UKB) for replication and investigation of ancestry-specific effects.ResultsWe identified nine genome-wide significant novel loci for FEV1, six for FVC and three for FEV1/FVC in the CKB. FEV1 and FVC showed significant negative genetic correlation with obesity traits in both the CKB and UKB. Genetic loci shared between lung function and obesity traits highlighted important biological pathways, including cell proliferation, embryo, skeletal and tissue development, and regulation of gene expression. Mendelian randomisation analysis suggested significant negative causal effects of BMI on FEV1 and on FVC in both the CKB and UKB. Lung function PRSs significantly modified the effect of change in BMI on change in lung function during an average follow-up of 8 years.ConclusionThis large-scale GWAS of lung function identified novel loci and shared genetic aetiology between lung function and obesity. Change in BMI might affect change in lung function differently according to a subject's polygenic background. These findings may open new avenues for the development of molecular-targeted therapies for obesity and lung function improvement.

  • Research Article
  • 10.1007/s11019-025-10276-5
Consent and its discontents: the case of UK Biobank
  • Jan 1, 2025
  • Medicine, Health Care, and Philosophy
  • Gulzaar Barn

UK Biobank is a major biomedical database and research resource, holding the genetic, health, and lifestyle information of half a million adult volunteers. Its datasets are accessible to approved researchers from academic, charity, government, and commercial organisations for health-related research in the public interest. Drawing upon a range of approved projects and the downstream applications of this research, I suggest that UK Biobank datasets have been processed towards ends that are inimical to its stated aims, breaking the terms of consent under which its participants entered the study. First, I provide an overview of the broad consent model employed by UK Biobank in recruiting participants and using their data. The consent documents and participant information leaflets used exhibit information failures in their framing of health-research in terms of disease and treatment, obscuring the full range of lawful uses of participants’ data. Beyond this, certain approved uses of UK Biobank data, including studies by insurance companies and direct-to-consumer genetic testing companies, arguably fall outside UK Biobank’s stated aims altogether. Moreover, UK Biobank has not adequately safeguarded against “dual use” issues. Tracking the trajectory of research outputs that used biobank data, I suggest that approved uses of biobank datasets have gone on to have objectionable further applications that are not in the public interest. Such applications include the development of polygenic scores that seek to predict “intelligence” for use in commercial embryo screening services. Such tools are rife with risk of harm and are being deployed without sufficient public deliberation or oversight.

  • Research Article
  • Cite Count Icon 12
  • 10.1161/circulationaha.121.057139
Microvascular Outcomes in Women With a History of Hypertension in Pregnancy.
  • Feb 15, 2022
  • Circulation
  • Michael C Honigberg + 3 more

Microvascular Outcomes in Women With a History of Hypertension in Pregnancy.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.thromres.2020.08.015
Polygenic risk score-analysis of thromboembolism in patients with acute lymphoblastic leukemia
  • Aug 12, 2020
  • Thrombosis Research
  • Kirsten Brunsvig Jarvis + 14 more

IntroductionThromboembolism (TE) is a common and serious toxicity of acute lymphoblastic leukemia (ALL) treatment, but studies of genetic predisposition have been underpowered with conflicting results. We tested whether TE in ALL and TE in the general adult population have a shared genetic etiology. Materials and methodsWe prospectively registered TE events and collected germline DNA in patients 1.0–45.9 years in the Nordic Society of Pediatric Hematology and Oncology (NOPHO) ALL2008 study (7/2008–7/2016). Based on summary statistics from two large genome-wide association studies (GWAS) on venous TE in adults (the International Network of VENous Thromboembolism Clinical Research Networks (INVENT) consortium and the UK Biobank), we performed polygenic risk score (PRS) analysis on TE development in the NOPHO cohort, progressively expanding the PRS by increasing the p-value threshold of single nucleotide polymorphism (SNP) inclusion. Results and conclusionEighty-nine of 1252 patients with ALL developed TE, 2.5 year cumulative incidence 7.2%. PRS of genome-wide significant SNPs from the INVENT and UK Biobank data were not significantly associated with TE, HR 1.16 (p 0.14) and 1.02 (p 0.86), respectively. Expanding PRS by increasing p-value threshold did not reveal polygenic overlap. However, subgroup analysis of adolescents 10.0–17.9 years (n = 231), revealed significant polygenic overlap with the INVENT GWAS. The best fit PRS, including 16,144 SNPs, was associated with TE with HR 1.76 (95% CI 1.23–2.52, empirical p-value 0.02). Our results support an underlying genetic predisposition for TE in adolescents with ALL and should be explored further in future TE risk prediction models.

  • Research Article
  • Cite Count Icon 196
  • 10.1016/j.jaci.2020.06.001
Association of asthma and its genetic predisposition with the risk of severe COVID-19
  • Jun 6, 2020
  • Journal of Allergy and Clinical Immunology
  • Zhaozhong Zhu + 5 more

Association of asthma and its genetic predisposition with the risk of severe COVID-19

  • Research Article
  • 10.1017/thg.2025.15
Genetic Similarity Clustering Using the UK Biobank as a Reference Dataset.
  • Apr 1, 2025
  • Twin research and human genetics : the official journal of the International Society for Twin Studies
  • Ngoc-Quynh Le + 2 more

Incorporating genetic data from diverse populations is crucial for understanding genetic contributions to diseases and ensuring health equity in healthcare practices. However, existing reference panels either capture a limited number of populations or have small sample sizes. We examine the UK Biobank's performance as a reference for clustering genetically similar individuals. Leveraging data from participants of diverse origins, we aim to improve population representation and mitigate bias caused by the limited number of populations in other reference panels. We combined countries of birth and ethnic backgrounds data fields from the UK Biobank and genetic information to infer genetically similar population labels. A random forest model was then trained on genetic principal components to identify each individual's most genetically similar population. The model's performance was validated using the 1000 Genomes and the CARTaGENE biobank data. We identified more diverse reference populations than present in datasets such as 1000 Genomes, covering 19 populations worldwide. Our model achieved medium to high precision and recall for most labeled populations, although lower rates were observed in closely related groups. For instance, we identified 519 people in CARTaGENE most genetically similar to the Middle Eastern reference sample derived in the UK Biobank (there are no Middle Eastern samples in 1000 Genomes), yielding an 81.1% precision and a 97.0% recall rate compared to demographic-based information. This practical approach of clustering genetically similar individuals utilizing existing biobank data may facilitate downstream analyses, such as genomewide association studies or polygenic risk scores in underrepresented populations in genetic studies.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 32
  • 10.1371/journal.pgen.1008202
Exploring various polygenic risk scores for skin cancer in the phenomes of the Michigan genomics initiative and the UK Biobank with a visual catalog: PRSWeb.
  • Jun 13, 2019
  • PLOS Genetics
  • Lars G Fritsche + 17 more

Polygenic risk scores (PRS) are designed to serve as single summary measures that are easy to construct, condensing information from a large number of genetic variants associated with a disease. They have been used for stratification and prediction of disease risk. The primary focus of this paper is to demonstrate how we can combine PRS and electronic health records data to better understand the shared and unique genetic architecture and etiology of disease subtypes that may be both related and heterogeneous. PRS construction strategies often depend on the purpose of the study, the available data/summary estimates, and the underlying genetic architecture of a disease. We consider several choices for constructing a PRS using data obtained from various publicly-available sources including the UK Biobank and evaluate their abilities to predict not just the primary phenotype but also secondary phenotypes derived from electronic health records (EHR). This study was conducted using data from 30,702 unrelated, genotyped patients of recent European descent from the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort within Michigan Medicine. We examine the three most common skin cancer subtypes in the USA: basal cell carcinoma, cutaneous squamous cell carcinoma, and melanoma. Using these PRS for various skin cancer subtypes, we conduct a phenome-wide association study (PheWAS) within the MGI data to evaluate PRS associations with secondary traits. PheWAS results are then replicated using population-based UK Biobank data and compared across various PRS construction methods. We develop an accompanying visual catalog called PRSweb that provides detailed PheWAS results and allows users to directly compare different PRS construction methods.

  • Research Article
  • Cite Count Icon 89
  • 10.1001/jamacardio.2018.2287
Association of Interleukin 6 Receptor Variant With Cardiovascular Disease Effects of Interleukin 6 Receptor Blocking Therapy
  • Aug 8, 2018
  • JAMA Cardiology
  • Tianxi Cai + 98 more

Electronic health record (EHR) biobanks containing clinical and genomic data on large numbers of individuals have great potential to inform drug discovery. Individuals with interleukin 6 receptor (IL6R) single-nucleotide polymorphisms (SNPs) who are not receiving IL6R blocking therapy have biomarker profiles similar to those treated with IL6R blockers. This gene-drug pair provides an example to test whether associations of IL6R SNPs with a broad range of phenotypes can inform which diseases may benefit from treatment with IL6R blockade. To determine whether screening for clinical associations with the IL6R SNP in a phenome-wide association study (PheWAS) using EHR biobank data can identify drug effects from IL6R clinical trials. Diagnosis codes and routine laboratory measurements were extracted from the VA Million Veteran Program (MVP); diagnosis codes were mapped to phenotype groups using published PheWAS methods. A PheWAS was performed by fitting logistic regression models for testing associations of the IL6R SNPs with 1342 phenotype groups and by fitting linear regression models for testing associations of the IL6R SNP with 26 routine laboratory measurements. Significance was reported using a false discovery rate of 0.05 or less. Findings were replicated in 2 independent cohorts using UK Biobank and Vanderbilt University Biobank data. The Million Veteran Program included 332 799 US veterans; the UK Biobank, 408 455 individuals from the general population of the United Kingdom; and the Vanderbilt University Biobank, 13 835 patients from a tertiary care center. IL6R SNPs (rs2228145; rs4129267). Phenotypes defined by International Classification of Diseases, Ninth Revision codes. Of the 332 799 veterans included in the main cohort, 305 228 (91.7%) were men, and the mean (SD) age was 66.1 (13.6) years. The IL6R SNP was most strongly associated with a reduced risk of aortic aneurysm phenotypes (odds ratio, 0.87-0.90; 95% CI, 0.84-0.93) in the MVP. We observed known off-target effects of IL6R blockade from clinical trials (eg, higher hemoglobin level). The reduced risk for aortic aneurysms among those with the IL6R SNP in the MVP was replicated in the Vanderbilt University Biobank, and the reduced risk for coronary heart disease was replicated in the UK Biobank. In this proof-of-concept study, we demonstrated application of the PheWAS using large EHR biobanks to inform drug effects. The findings of an association of the IL6R SNP with reduced risk for aortic aneurysms correspond with the newest indication for IL6R blockade, giant cell arteritis, of which a major complication is aortic aneurysm.

More from: Communications Biology
  • New
  • Research Article
  • 10.1038/s42003-025-08883-2
Kinetoplast DNA structure, RNA editing patterns and small respiratory Complex I in Trypanosoma musculi.
  • Nov 7, 2025
  • Communications biology
  • Ju-Feng Wang + 5 more

  • New
  • Research Article
  • 10.1038/s42003-025-08909-9
Cross-platform motif discovery and benchmarking to explore binding specificities of poorly studied human transcription factors.
  • Nov 7, 2025
  • Communications biology
  • Ilya E Vorontsov + 33 more

  • New
  • Research Article
  • 10.1038/s42003-025-09120-6
The composition and structure of the outer kinetochore KMN complex is conserved across kingdoms.
  • Nov 7, 2025
  • Communications biology
  • Dipesh Kumar Singh + 7 more

  • New
  • Addendum
  • 10.1038/s42003-025-09139-9
Author Correction: Cellular senescence in white matter microglia is induced during ageing in mice and exacerbates the neuroinflammatory phenotype.
  • Nov 6, 2025
  • Communications biology
  • Tatsuyuki Matsudaira + 14 more

  • New
  • Research Article
  • 10.1038/s42003-025-08895-y
The influence of environment on bacterial co-abundance in the gut microbiomes of healthy human individuals.
  • Nov 6, 2025
  • Communications biology
  • Christophe Boetto + 11 more

  • New
  • Research Article
  • 10.1038/s42003-025-08903-1
A mismatch in enzyme-redox partnerships underlies divergent cytochrome P450 activities between human hepatocytes and microsomes.
  • Nov 6, 2025
  • Communications biology
  • Tashinga E Bapiro + 8 more

  • New
  • Research Article
  • 10.1038/s42003-025-08905-z
The need for speed: drivers and consequences of accelerated replication forks.
  • Nov 6, 2025
  • Communications biology
  • Dávid Lukáč + 2 more

  • New
  • Research Article
  • 10.1038/s42003-025-08897-w
CAPRIN1 specifically mediates m6A modification of RIG-I RNA to inhibit Mycobacterium Tuberculosis infection.
  • Nov 6, 2025
  • Communications biology
  • Lijuan Zhou + 19 more

  • New
  • Research Article
  • 10.1038/s42003-025-08907-x
Oncostatin M induces epigenetic reprogramming in renal cell carcinoma-associated endothelial cells.
  • Nov 6, 2025
  • Communications biology
  • Hieu-Huy Nguyen-Tran + 2 more

  • New
  • Research Article
  • 10.1038/s42003-025-08896-x
The association of the rumen virome with methane emissions in dairy cattle.
  • Nov 6, 2025
  • Communications biology
  • Carlos Navarro Marcos + 4 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon