The enormous diversity of bacteriophages and their bacterial hosts presents a significant challenge to predict which phages infect a focal set of bacteria. Infection is largely determined by complementary - and largely uncharacterized - genetics of adsorption, injection, cell take-over and lysis. Here we present a machine learning approach to predict phage-bacteria interactions trained on genome sequences of and phenotypic interactions amongst 51 Escherichia coli strains and 45 phage strains that coevolved in laboratory conditions for 37 days. Leveraging multiple inference strategies and without a priori knowledge of driver mutations, this framework predicts both who infects whom and the quantitative levels of infections across a suite of 2,295 potential interactions. We found that the most effective approach inferred interaction phenotypes from independent contributions from phage and bacteria mutations, accurately predicting of interactions while reducing the relative error in the estimated strength of the infection phenotype by . Feature selection revealed key phage and E. coli mutations that have a significant influence on the outcome of phage-bacteria interactions, corroborating sites previously known to affect phage infections, as well as identifying mutations in genes of unknown function not previously shown to influence bacterial resistance. The method's success in recapitulating strain-level infection outcomes arising during coevolutionary dynamics may also help inform generalized approaches for imputing genetic drivers of interaction phenotypes in complex communities of phage and bacteria.
Read full abstract