Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions.

John A Lees,T Tien Mai,Julian Parkhill,Marco Galardini,Jukka Corander,Nicole E Wheeler,Samuel T Horsfield

doi:10.1128/mbio.01344-20

Abstract

Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics. Genome-wide association study (GWAS) methods have been applied to study these relations, but the plastic nature of bacterial genomes and the clonal structure of bacterial populations creates challenges. We introduce an alignment-free method which finds sets of loci associated with bacterial phenotypes, quantifies the total effect of genetics on the phenotype, and allows accurate phenotype prediction, all within a single computationally scalable joint modeling framework. Genetic variants covering the entire pangenome are compactly represented by extended DNA sequence words known as unitigs, and model fitting is achieved using elastic net penalization, an extension of standard multiple regression. Using an extensive set of state-of-the-art bacterial population genomic data sets, we demonstrate that our approach performs accurate phenotype prediction, comparable to popular machine learning methods, while retaining both interpretability and computational efficiency. Compared to those of previous approaches, which test each genotype-phenotype association separately for each variant and apply a significance threshold, the variants selected by our joint modeling approach overlap substantially.IMPORTANCE Being able to identify the genetic variants responsible for specific bacterial phenotypes has been the goal of bacterial genetics since its inception and is fundamental to our current level of understanding of bacteria. This identification has been based primarily on painstaking experimentation, but the availability of large data sets of whole genomes with associated phenotype metadata promises to revolutionize this approach, not least for important clinical phenotypes that are not amenable to laboratory analysis. These models of phenotype-genotype association can in the future be used for rapid prediction of clinically important phenotypes such as antibiotic resistance and virulence by rapid-turnaround or point-of-care tests. However, despite much effort being put into adapting genome-wide association study (GWAS) approaches to cope with bacterium-specific problems, such as strong population structure and horizontal gene exchange, current approaches are not yet optimal. We describe a method that advances methodology for both association and generation of portable prediction models.

Highlights

Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics
The elastic net uses a linear prediction model as with a standard linear regression run between a phenotype and all genetic variants and tries to find slopes for all the variants which best predict the phenotype
A shrinkage term (␭) is used to prevent overfitting, adding a cost to each fitted slope proportional to its value. This has the effect of making many of the genetic variants have a slope of zero, and so they can be removed from the model entirely

Summary

Introduction

Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics. IMPORTANCE Being able to identify the genetic variants responsible for specific bacterial phenotypes has been the goal of bacterial genetics since its inception and is fundamental to our current level of understanding of bacteria This identification has been based primarily on painstaking experimentation, but the availability of large data sets of whole genomes with associated phenotype metadata promises to revolutionize this approach, not least for important clinical phenotypes that are not amenable to laboratory analysis. The overall problem of relating microbial genotype to phenotype has generally been approached by genome-wide association study (GWAS) methods [11,12,13] Such methods determine whether there is sufficient evidence to conclude that a specific variant explains some proportion of the variation of a trait, after accounting for as many statistical artifacts as possible. The predictive variants in the identified sets may not fully overlap those significant by themselves, and the mapping does not necessarily lend itself readily to the same interpretation as P values in a GWAS

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: mBio	Publication Date: Jul 7, 2020
Citations: 80	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: mBio

Lead the way for us

Similar Papers

Genome-Wide Association Studies (GWAS) Approaches for the Detection of Genetic Variants Associated with Antibiotic Resistance: A Systematic Review.
Jeanneth Mosquera-Rendón ... Jaime Robledo
Microorganisms | VOL. 11
Jeanneth Mosquera-Rendón, et. al.Jeanneth Mosquera-Rendón ... Jaime Robledo
27 Nov 2023
Microorganisms | VOL. 11

Potential Effects of Horizontal Gene Exchange in the Human Gut.
Aaron Lerner ... Torsten Matthias
Frontiers in Immunology | VOL. 8
Aaron Lerner, et. al.Aaron Lerner ... Torsten Matthias
27 Nov 2017
Frontiers in Immunology | VOL. 8

Multivariate genome-wide associations for immune traits in two maternal pig lines
Katharina Roth ... Christine Große-Brinkhaus
BMC Genomics | VOL. 24
Katharina Roth, et. al.Katharina Roth ... Christine Große-Brinkhaus
28 Aug 2023
BMC Genomics | VOL. 24

P-521 Inter-individual variation in gonadotoxicity in female childhood cancer survivors – a genome-wide association study: results from the PanCareLIFE study
A L Van Der Kooi ... A Uitterlinden
Human Reproduction | VOL. 39
A L Van Der Kooi, et. al.A L Van Der Kooi ... A Uitterlinden
03 Jul 2024
Human Reproduction | VOL. 39

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: mBio