Author response: The landscape of transcriptional and translational changes over 22 years of bacterial adaptation

Premal Shah,Alexander L Cope,Srujana S Yadavalli,Shun Liang,John S Favate

doi:10.7554/elife.81979.sa2

Abstract

Article Figures and data Abstract Editor's evaluation eLife digest Introduction Results Discussion Materials and methods Appendix 1 Data availability References Decision letter Author response Article and author information Metrics Abstract Organisms can adapt to an environment by taking multiple mutational paths. This redundancy at the genetic level, where many mutations have similar phenotypic and fitness effects, can make untangling the molecular mechanisms of complex adaptations difficult. Here, we use the Escherichia coli long-term evolution experiment (LTEE) as a model to address this challenge. To understand how different genomic changes could lead to parallel fitness gains, we characterize the landscape of transcriptional and translational changes across 12 replicate populations evolving in parallel for 50,000 generations. By quantifying absolute changes in mRNA abundances, we show that not only do all evolved lines have more mRNAs but that this increase in mRNA abundance scales with cell size. We also find that despite few shared mutations at the genetic level, clones from replicate populations in the LTEE are remarkably similar in their gene expression patterns at both the transcriptional and translational levels. Furthermore, we show that the majority of the expression changes are due to changes at the transcriptional level with very few translational changes. Finally, we show how mutations in transcriptional regulators lead to consistent and parallel changes in the expression levels of downstream genes. These results deepen our understanding of the molecular mechanisms underlying complex adaptations and provide insights into the repeatability of evolution. Editor's evaluation This paper comprehensively analyzes how gene expression has changed in eleven E. coli strains after 50,000 generations of laboratory evolution. It confirms that, overall, changes in RNA levels are more reproducible than the underlying genetic changes and begins to investigate how some of these changes lead to increased fitness in this environment. This dataset will be a valuable resource for testing theories about how genotypic and phenotypic evolution are coupled and for understanding how bacterial gene regulatory networks evolve during adaptation. https://doi.org/10.7554/eLife.81979.sa0 Decision letter eLife's review process eLife digest The reason we look like our parents is because we inherit their genes. Genes carry the instructions for our cells to make messenger RNAs (mRNAs), which our cells then translate into proteins. Proteins, in turn, determine many of our features. This is true for all living organisms. Any changes – or mutations – in an organism’s genes can lead to variations in its proteins, which can alter the organism’s traits. This is the basis for evolution: mutations can lead to changes that allow an organism to better adapt to a new environment. This increases the organism’s chances of survival and reproduction – its evolutionary ‘fitness’ – and makes it more likely that the mutation that generated the new trait in the first place will be passed on to the organism’s descendants. However, just because two organisms have evolved similar traits to adapt to similar environments, it does not mean that the genetic basis for the adaptation is the same. For example, many animals share similar coloring to warn off predators, but the way that coloring is coded genetically is completely different. In species that are related (which share many of the same genes), this type of evolution is called ‘parallel evolution’, and it can make it difficult for scientists to understand how an organism evolved and pinpoint exactly what mutations are linked to which features. In 1988, scientists established the ‘long-term evolution experiment’ to tackle questions about how evolution works. The experiment, which has been running for over 30 years, consisted on tracking the evolution of 12 populations of Escherichia coli bacteria grown in separate flasks containing the same low-nutrient medium. The initial 12 populations were genetically identical, making this an ideal system to study parallel evolution, since all the populations had to evolve to adapt to the same environment, whilst isolated from each other. In previous experiments, scientists had already noted that while the different bacterial populations grew in similar ways, they had mostly different mutations. To better understand parallel evolution, Favate et al. analyzed the synthesis rates of RNA and proteins in the E. coli populations used in the long-term evolution experiment. They found that 22 years after the start of the experiment, all 12 populations produced more RNA, grew faster and were bigger. Additionally, while the different populations had accumulated few shared mutations after 22 years, they all shared similar patterns of RNA levels and protein synthesis rates. Further probing revealed that parallel evolution may be linked to how genes are regulated: mutations in regulators of related groups of genes involved in the same processes inside the cell can amplify the degree of parallel changes in organisms. This means that mutations in these genes may lead to similar traits. These findings provide insight into how parallel evolution arises in the long-term evolution experiment, and provides clues as to how the same traits can evolve several times. Introduction A key challenge in biology is understanding the relationships between genotype, phenotype, and evolutionary fitness. Comparative genomic approaches and large-scale mutation experiments have allowed us to map genetic changes to phenotypic changes underlying adaptation. For example, mutations that increase the affinity of hemoglobin for oxygen are adaptive in high-altitude dwelling deer mice (Natarajan et al., 2013), and mutations to the influenza haemagglutinin and neuraminidase proteins increase viral fitness (Gong et al., 2013; Lee et al., 2018). Adaptive phenotypes can also result from changes in multiple genes, such as in yeast evolving under nutrient limitation (Gresham et al., 2008; Lauer et al., 2018; Venkataram et al., 2016), bacterial adaptation during infection (Lieberman et al., 2011) or to high temperature (Tenaillon et al., 2012), and in the evolution of smaller body sizes in Atlantic silversides under a size-selective fishing regime (Therkildsen et al., 2019). In many cases, similar adaptive phenotypes arise from different mutations to the same gene or regulatory region or from combinations of mutations to different genes and regulatory regions. This redundancy, where many genotypes produce similar phenotypes, makes it difficult to understand the molecular mechanisms behind adaptive phenotypes and is exacerbated by potential epistatic interactions among mutations. On the other hand, adaptive changes to expression have been shown to occur during the domestication of eggplants and tomatoes (Koenig et al., 2013; Page et al., 2019), and in hybridization events between two weeds (Kryvokhyzha et al., 2019). Although not direct observations of adaptive changes to gene expression, recent comparative analyses of across-species gene expression suggest that the expression levels of numerous genes are evolving under directional selection in vertebrates, fish, and butterflies (Brawand et al., 2011; Catalán et al., 2019; Fukushima and Pollock, 2020; Gillard et al., 2021). Here, we use the long-term evolution experiment (LTEE) (Lenski et al., 1991) as a model to characterize the molecular changes underlying adaptation to a novel environment. In the LTEE, 12 replicate populations of Escherichia coli have been adapting in parallel to a carbon-limited medium since 1988, growing over 75,000 generations thus far. As is common in lab-based evolution experiments, the replicate populations display similar phenotypic changes (Blount et al., 2018). Examples include increases in fitness (Wiser et al., 2013) and cell size (Grant et al., 2021; Philippe et al., 2009). In contrast, a significant amount of diversity exists at the genomic level across the replicates (Tenaillon et al., 2016), with some lines having orders of magnitude more mutations than others due to the development of mutator phenotypes (Good et al., 2017). While few mutations are shared at the nucleotide level, some genes are commonly mutated across evolved lines (Maddamsetti et al., 2017; Woods et al., 2006). Still, how most of the mutations affect fitness in the system is unknown. Researchers have hypothesized that similar gene expression patterns might contribute to the parallel increases in fitness in the LTEE (Fox and Lenski, 2015). An earlier microarray-based study of transcriptional changes in LTEE showed parallel changes in mRNA abundances in clones from two evolved lines (Ara-1 and Ara+1) at 20,000 generations (Cooper et al., 2003). However, it remained unclear which mutations were responsible for these parallel changes and whether the remaining 10 lines also had similar expression profiles. Moreover, protein-coding mRNAs must be translated to perform their function. The majority of cellular biomass and energy expenditure is devoted to translation (Bernier et al., 2018), and the role of hierarchical regulation of gene expression in evolutionary processes has been a subject of debate in recent years (Albert et al., 2014; Artieri and Fraser, 2014; McManus et al., 2014). However, we know little of changes in gene expression at the translational level in the LTEE. Here, we use both RNA-seq and Ribo-seq (Ingolia et al., 2009) to profile the landscape of transcriptional and translational changes after 22 years (50,000 generations) of evolution in the LTEE to answer five fundamental questions: (i) do evolved lines show similar transcriptomic and translatomic changes after 50,000 generations despite acquiring mostly unique sets of mutations? (ii) how do changes in cell size affect changes in absolute expression levels? (iii) do changes in gene expression at the translational level buffer, augment, or match changes at the transcriptional level?, (iv) what classes of genes or pathways are altered in the evolved lines, and finally, (v) can we identify mutations responsible for parallel changes in gene expression across replicate populations? Results We generated RNA-seq and Ribo-seq datasets for single clones grown in the exponential phase from each of the 12 evolved lines with sequenced genomes in Tenaillon et al., 2016 (see Materials and methods section M1 for specific clone IDs) (Figure 1A). We aligned each clone’s data to its unique genome and considered expression changes of 4131 transcripts from the ancestor. Due to concerns of contamination in our Ara+6 samples, we removed them from further analysis. We averaged between 151 and 1693 deduplicated reads per transcript across the 52 libraries (Figure 1—figure supplement 1A, Supplementary file 1), the distributions of read counts per transcript were similar across lines, replicates, and sequencing methods (Figure 1—figure supplement 1B), and correlations between biological replicates were high (Pearson correlation coefficient R>0.93, Figure 1—figure supplement 1C). We also verified the presence of three-nucleotide periodicity in our Ribo-seq datasets (Figure 1—figure supplement 1D). Previous studies have shown the existence of distinct ecotypes in the Ara-2 population (Plucain et al., 2014; Rozen et al., 2009). Based on an analysis of mutations, our Ara-2 clone comes from the L ecotype (see Appendix A1). Our Ara-3 clone can utilize citrate as a carbon source (Cit+). Finally, we note that both ancestral and evolved lines were grown in standard LTEE media supplemented with additional glucose to obtain enough starting material for paired RNA-seq and Ribo-seq samples. We discuss the potential impacts of this difference in the supplement (Appendix A2). Figure 1 with 3 supplements see all Download asset Open asset Parallel changes in mRNA abundances. (A) Schematic diagram of the experimental setup. (B) Pairwise Pearson correlations based on l⁢o⁢g10⁢(T⁢P⁢M) (where transcripts per million [TPM] is the mean from replicates) separated by comparisons between evolved lines or from ancestors to evolved lines. p-Values indicate the results of a Kolmogorov-Smirnov (KS) test. For differentially expressed genes (DESeq2 q ≤ 0.01), evolved line were compared using the union of the significant genes from each line. When comparisons were between an evolved line and an ancestor, the significant genes from that evolved line were used. (C) Pairwise Spearman’s correlations based on fold-changes from all genes, and the union of the significant genes between two evolved lines (differentially expressed). (D) Fold-changes of differentially expressed genes that were significantly altered in at least one line. Genes are ordered left to right in increasing mean fold-change across all evolved lines. Genes containing deletions are not assigned a fold-change and are represented as gray spaces. Lines with a mutator phenotype are in red. (E) The upper panel shows the number of genes (y-axis) that were both statistically significant and had a fold-change in the same direction in a particular number of lines (x-axis). The bottom panel shows the expected (dashed) and observed (solid) probability of observing a particular result. p-Values are the result of a KS test between the observed and expected distributions. (F) Principal component analysis (PCA) based on all fold-changes. In this case, genes with some form of deletion (complete or indel) are assigned a fold-change of –10 to indicate severe downregulation because they are either completely absent from the genome or not expected to produce functional proteins. Evolved lines show parallel transcriptomic changes Gene expression levels are similar across evolved lines Across the six evolved lines with non-mutator phenotypes in LTEE, we observe a modest degree of parallelism in genetic changes. We find that 22 genes share mutations in two or more evolved lines (Tenaillon et al., 2016). However, it remains unclear whether these parallel genetic changes are sufficient to explain the high degree of parallelism in fitness gains over 50,000 generations. We hypothesize that the evolved lines demonstrate a higher degree of parallel transcriptomic changes despite having unique genomes. To test this hypothesis, we compared the ancestors’ and evolved lines’ mRNA abundances (measured in transcripts per million [TPM]). We find that the expression levels of most genes remain unchanged, leading to high correlations between ancestral and evolved strains (Spearman’s correlation coefficient r>0.95, Figure 1B). Moreover, pairwise correlations between evolved strains were only marginally higher than the correlations between evolved strains and the ancestors. However, these increases were not statistically significant (KS test, p-value = 0.28, Figure 1B). This suggests that transcriptomic changes are likely restricted to a small portion of the genome. To more formally test the hypothesis that evolved lines show parallel changes in the transcriptome, we used DESeq2 (Love et al., 2014) to identify differentially expressed genes (DEGs) and quantify expression changes between each evolved line and the ancestor (for full results, see Supplementary file 2). A gene was considered differentially expressed between the evolved line and the ancestor if it reached a statistical threshold of q-value ≤0.01. We find that most fold-changes were small (Figure 1—figure supplement 2A) and consistent with our expectations; only a small proportion of the transcriptome was significantly altered (Figure 1—figure supplement 2B). On average, ∼270 genes (out of 4131) were differentially expressed in an evolved line across all 11 pairwise comparisons between each evolved line and the ancestor. In total, 2986 genes were differentially expressed, but this consisted of only 1273 unique genes, indicating that many DEGs are shared across evolved lines. The expression levels of these 1273 DEGs were more similar between evolved lines than between an evolved line and its ancestor (Figure 1B). Correlations based on fold-changes for DEGs were higher than those based on all genes (Figure 1C). Fold-changes for the set of 1273 DEGs were generally in the same direction regardless of their statistical significance (Figure 1D). Taken together, this is suggestive of parallelism in the evolution of gene expression across the evolved lines. Quantifying the degree of parallelism of DEGs To test if the number of observed parallel changes in gene expression across evolved lines differs from the number of parallel changes expected by random chance, we estimated the probability distribution representing the expected number of DEGs altered in the same direction given different proportions of up- and downregulated genes in each line. This null distribution is well approximated by the distribution of the sum of independent non-identical binomial random variables (SINIB), which we estimated using the R package sinib (Liu and Quertermous, 2018) by parameterizing the function with the number of up- and downregulated DEGs from each line (Figure 1—figure supplement 2C). We find that the number of genes with expression changes in the same direction is significantly higher than expected by chance (KS test, p-value ∼ 0.01, Figure 1E – bottom panel). For example, if DEGs were randomly distributed across all lines, we would expect three genes to share expression changes in five or more lines. Instead, 117 genes are differentially expressed in the same direction in at least five lines. Magnitude and direction of expression changes Given the high correlations between expression levels of DEGs between evolved lines, it stands to reason that the correlation between fold-changes of DEGs genes will be higher than the correlation between fold-changes across all genes. Consistent with these expectations, we find that pairwise correlations between evolved lines of fold-changes in DEGs were higher than the fold-changes of all genes (Figure 1C). While the number of DEGs varies widely across lines (Figure 1—figure supplement 2B), 7 out of 11 evolved lines have more significantly downregulated DEGs than upregulated (Figure 1—figure supplement 2D, binomial test, p-value <0.05). Furthermore, the magnitude of fold-changes of downregulated DEGs was significantly higher than fold-changes of upregulated DEGs in all 11 evolved lines (Figure 1—figure supplement 2D, KS test, p-value <0.0001). Variation in expression changes across evolved lines So far, we have considered the degree of parallelism in expression level changes across the evolved lines. However, the evolved lines differ not only in terms of their underlying mutations (Tenaillon et al., 2016) but also vary substantially at the phenotypic level. For instance, half of the evolved lines have developed a mutator phenotype, causing them to accumulate orders of magnitude more mutations than the non-mutator lines. Unlike the other 11 evolved lines, Ara-3 can utilize citrate as a carbon source (Blount et al., 2012), and Ara-2 has developed distinct, coexisting ecotypes (Rozen et al., 2009). We wanted to characterize how phenotypic variation across evolved lines might correlate with variation in expression levels. Principal component analysis (PCA) based on all fold-changes mainly separates Ara-3 from the rest of the lines, whereas PC2 appears to separate at least some of the mutators from the non-mutators (Figure 1F). Variation in PC1 and PC2 seems primarily driven by deletions (Figure 1—figure supplement 2E), coded as downregulated genes (log2 fold-change = –10) in this analysis. The magnitude of encoded fold-changes of the deleted genes did not affect the groupings of the PCA between log2(fold-change) –1 and –10. Given the unique circumstances in Ara-3 and Ara-2, it is not surprising that these lines group separately from the others in the PCA. Evolved lines are larger in cell size and carry more mRNAs In the previous section, we discussed how changes in relative gene expression patterns across the evolved lines are similar. However, all evolved lines are significantly larger than their ancestors (Grant et al., 2021; Lenski and Mongold, 2000; Mongold and Lenski, 1996). Typically, bacterial cell volume depends on nutrient availability and growth rate (Chien et al., 2012; Schaechter et al., 1958; Taheri-Araghi et al., 2015) and the increase in cell volume in evolved lines appears to be under selection rather than solely due to increases in growth rate (Mongold and Lenski, 1996; Philippe et al., 2009). As a result of these larger sizes, the cells in evolved lines have higher biomass and proportionally higher nucleic acid levels than the ancestors (Turner et al., 2017). Therefore, it is reasonable to expect that absolute abundances of mRNA molecules per cell should also increase with cell volume to maintain concentrations and reaction rates (Padovan-Merhar et al., 2015). To get a complete picture of transcriptional changes, we also quantified absolute changes in mRNA abundances. We used phase-contrast microscopy to measure cell shape and estimate cell volume to confirm that our clones from evolved lines were larger than their ancestors (see Appendix A3). Consistent with earlier studies, we find that each evolved line is larger in volume compared to its ancestors (Figure 2A, Supplementary file 3). Our volume estimates are also consistent with measurements obtained using a Coulter counter from a recent study (Grant et al., 2021; Figure 2—figure supplement 1A, Pearson correlation coefficient R=0.87). Next, we estimated the absolute abundances of transcripts per CFU by comparison to known standards in our sequencing libraries. Specifically, we added the ERCC spike-in controls (Baker et al., 2005; External RNA Controls Consortium, 2005) to our sequencing libraries and used a linear model to relate the number of molecules of a spike-in RNA added to its TPM in each sample. We find a linear relationship between molecules added and estimated TPM across all samples and replicates (Figure 2B, Figure 2—figure supplement 2A, Supplementary file 5). Finally, we measured the number of cells used in the generation of each sequencing library by counting colony-forming units (CFUs) from each culture and accounting for sampling at each step of the library preparation (Supplementary file 4). Note that due to various factors, our estimates of CFU are likely underestimates (see Appendix A3, Figure 2—figure supplement 1C). Nonetheless, our gene-specific estimates of absolute abundances per CFU are highly similar across biological replicates (R>0.93). Together, this allows us to measure absolute RNA abundance per CFU. Figure 2 with 2 supplements see all Download asset Open asset Evolved lines are larger in cell size and carry more mRNAs. (A) All evolved lines are larger than the ancestral strain. Distributions of cellular volume as determined by phase-contrast microscopy and assuming sphero-cylindrical shape of Escherichia coli along with representative images for each line. Numbers underneath a line’s name indicate the total number of cells imaged (scale bar is 10 µm). The dashed line indicates the ancestral median, p-values indicate the results of a t-test when each line is compared to the ancestor, **** p ≤ 0.0001. Lines listed in red have mutator phenotypes. (B) Abundances of spike-in RNA control oligos are correlated with their estimates in sequencing data. Linear models relating the number of molecules of each ERCC control sequence added to their RNA-seq TPM (transcripts per million) in Ara+1 RNA-seq sample (see Figure 2—figure supplement 2 for data for all lines). (C) Most genes have a higher absolute expression in evolved lines. Changes in the absolute number of mRNA molecules per CFU (colony-forming unit) in the 50,000th generation of Ara+1 relative to the ancestor. The values plotted are the averages between two replicates of the evolved lines and both replicates from two ancestors (REL606 and REL607; see Figure 2—figure supplement 2 for all lines). (D) Absolute changes in mRNA abundances of genes in evolved lines are significantly larger than the variation between biological replicates (KS test, p<0.0001 in all cases). Pink distributions indicate gene-specific fold-changes between biological replicates for each line (centered around 1). Purple distributions show the absolute fold-changes in molecules of RNA per CFU from the ancestor to each evolved line. Fold-changes are calculated in the same manner as in C. (E) Larger evolved lines have more mRNA per CFU. Relationship between the median cellular volume for each line and the total number of RNA molecules per CFU. Total molecules of RNA are calculated as the sum of the average number of molecules for each gene between replicates. We find that most genes have increased mRNA abundance per CFU compared to the ancestor (Figure 2C, Figure 2—figure supplement 2B, Supplementary file 6) and that these differences were significantly larger than the differences between biological replicates (Figure 2D). Furthermore, the increases in total mRNA abundance scale with cellular volume, with larger evolved lines having more molecules per typical cell volume (Figure 2E). This suggests that the evolved lines have more mRNA per cell than the ancestors. Such an increase may be needed to maintain reaction rates in the face of increasing cell volumes. Another hypothesis is that stockpiling resources like mRNA and ribosomes might allow evolved lines to reduce the time spent in the lag phase after transfer to fresh medium. Indeed, reduced lag times occur in the LTEE (Vasi et al., 1994), and simulations suggest that bacteria can evolve to ‘anticipate’ the regular transfer to fresh medium in a serial transfer regime (van Dijk et al., 2019). Transcriptional changes drive translational changes While mRNA abundances are an important molecular phenotype potentially linking genomic changes to adaptations, changes in mRNA abundances can themselves be buffered or augmented at other downstream regulatory processes such as translation (Albert et al., 2014; Artieri and Fraser, 2014; McManus et al., 2014). Translational regulation affects the rate at which an mRNA produces its protein product, and mRNAs vary widely in their translation efficiencies in both eukaryotes and prokaryotes (Ingolia et al., 2009; Li et al., 2014; Picard et al., 2012). However, the role of changes in translational regulation during adaptation and speciation remains poorly understood and, at least in yeast, is heavily debated (Albert et al., 2014; Artieri and Fraser, 2014; McManus et al., 2014). Moreover, because translation occupies the majority of cellular resources (Bernier et al., 2018), it may be a prime target for evolution in the LTEE. To study translational changes in LTEE, we performed Ribo-seq in the evolved lines and their ancestors (Figure 1A). We find that changes in ribosome densities are highly correlated with changes in mRNA abundances (Figure 3A, Figure 3—figure supplement 1A). This is somewhat surprising because changes in environmental conditions and small genetic perturbations usually result in large changes at the translational level (Gerashchenko et al., 2012; Rubio et al., 2021; Woolstenhulme et al., 2015). Despite the high correlation between mRNA and ribosome footprint fold-changes at the genomic level, individual genes might have altered ribosome densities. We used Riborex to quantify changes in ribosome densities (Li et al., 2017). Riborex quantifies changes in footprint densities while accounting for any changes in mRNA abundances. We considered a gene significantly altered if it reached a q-value ≤0.01. Only a handful of genes have altered ribosome densities, and none are shared between three or more lines (Figure 3B, Supplementary file 7). This suggests that over the course of the LTEE, most changes happen at the transcriptional level with insufficient evidence for significant changes at the translational level. We note that earlier studies have indicated that Riborex has limited power to detect small to moderate shifts in ribosome densities based on simulated data (Li et al., 2017). Although comparing these simulations to our data is difficult, it is possible that we are failing to detect some of these smaller shifts in gene-specific ribosome densities. Regardless, our results indicate a greater role for changes in factors regulating mRNA abundances than factors regulating mRNA translation. Figure 3 with 1 supplement see all Download asset Open asset Changes in gene expression at the translational level. (A) Translational changes are correlated with transcriptional changes. The relationship between RNA-seq and Ribo-seq fold-changes in Ara+1 (see Figure 3—figure supplement 1A for all evolved lines). (B) The distribution of genes with significantly altered ribosome densities (q≤0.01) estimated using Riborex (q≤0.01). (C) Evolved lines have faster translation termination. Stop codons had lowered ribosome density compared to all sense codons. Changes in codon-specific ribosome densities in each of the evolved lines relative to the ancestor. Codons are colored according to the amino acid they code for. Amino acids are ordered left to right in order of mean fold-change across the lines. (D) Fold-changes in mRNA abundances of translation termination factors and related genes ykfJ, prfH, prfA, prmC, prfB, fusA, efp, prfC. RNA-seq fold-changes for termination factors, asterisks indicate DESeq2 q-values (blank: p>0.05, *: p≤0.05, **: p≤0.01, ***: p≤0.001 ****: p≤0.0001 and an ‘M’ indicates an SNP in that gene). While Riborex can find gene-specific changes in ribosome densitie

Full Text