Genome-wide association studies (GWAS) with binary or single phenotype data have successfully identified disease-associated genotypes and determinants of antimicrobial resistance. We describe a novel phenotype-to-genotype approach for a major bacterial pathogen that involves simultaneously testing for associations among multiple disease-related phenotypes and linkages between phenotypic variation and genetic determinants. High-throughput assays quantified variation among 163 Neisseria meningitidis serogroup W ST-11 clonal complex isolates for 11 phenotypic traits. A comparison of carriage and two disease subgroups detected significant differences between groups for eight phenotypic traits. Candidate genotypic testing indicated that indels in csw, a capsular biosynthesis gene, were associated with reduced survival in antibody-depleted heat-inactivated serum. GWAS testing detected 341 significant genetic variants (3 single-nucleotide polymorphisms and 338 unitigs) across all traits except serum bactericidal antibody-depleted assays. Growth traits were associated with variants of capsular biosynthesis genes, carbonic anhydrase, and an iron-uptake system while adhesion-linked variation was in pilC2, marR, and mutS. Multiple phase variation states or combinatorial phasotypes were associated with significant differences in multiple phenotypes. Controlling for group effects through regression and recursive random forest approaches detected group-independent effects for nalP with biofilm formation and fetA with a growth trait. Through random forest testing, nine phenotypes were weakly predictive of MenW:cc11 sub-lineage, original or 2013, for disease isolates while three characteristics separated carriage and disease isolates with >80% accuracy. This study demonstrates the power of combining high-throughput phenotypic testing of pathogenically relevant isolate collections with genomics for identifying genetic determinants of specific disease-relevant phenotypes and the pathobiology of microbial pathogens.IMPORTANCENext-generation sequencing technologies have led to the creation of extensive microbial genome sequence databases for several bacterial pathogens. Mining of these databases is now imperative for unlocking the maximum benefits of these resources. We describe a high-throughput methodology for detecting associations between phenotypic variation in multiple disease-relevant traits and a range of genetic determinants for Neisseria meningitidis, a major causative agent of meningitis and septicemia. Phenotypic variation in 11 disease-related traits was determined for 163 isolates of the hypervirulent ST-11 lineage and linked to specific single-nucleotide polymorphisms, short sequence variants, and phase variation states. Application of machine learning algorithms to our data outputs identified combinatorial phenotypic traits and genetic variants predictive of a disease association. This approach overcomes the limitations of generic meta-data, such as disease versus carriage, and provides an avenue to explore the multi-faceted nature of bacterial disease, carriage, and transmissibility traits.
Read full abstract