Abstract

Article Figures and data Abstract Editor's evaluation Introduction Results and discussion Materials and methods Data availability References Decision letter Author response Article and author information Metrics Abstract Self-cleaving ribozymes are RNA molecules that catalyze the cleavage of their own phosphodiester backbones. These ribozymes are found in all domains of life and are also a tool for biotechnical and synthetic biology applications. Self-cleaving ribozymes are also an important model of sequence-to-function relationships for RNA because their small size simplifies synthesis of genetic variants and self-cleaving activity is an accessible readout of the functional consequence of the mutation. Here, we used a high-throughput experimental approach to determine the relative activity for every possible single and double mutant of five self-cleaving ribozymes. From this data, we comprehensively identified non-additive effects between pairs of mutations (epistasis) for all five ribozymes. We analyzed how changes in activity and trends in epistasis map to the ribozyme structures. The variety of structures studied provided opportunities to observe several examples of common structural elements, and the data was collected under identical experimental conditions to enable direct comparison. Heatmap-based visualization of the data revealed patterns indicating structural features of the ribozymes including paired regions, unpaired loops, non-canonical structures, and tertiary structural contacts. The data also revealed signatures of functionally critical nucleotides involved in catalysis. The results demonstrate that the data sets provide structural information similar to chemical or enzymatic probing experiments, but with additional quantitative functional information. The large-scale data sets can be used for models predicting structure and function and for efforts to engineer self-cleaving ribozymes. Editor's evaluation This is a valuable study that provides compelling evidence for important nucleotides in five self-cleaving ribozymes. Epistasis analyses are novel in this field. https://doi.org/10.7554/eLife.80360.sa0 Decision letter Reviews on Sciety eLife's review process Introduction Challenges with predicting the functional effects of changing an RNA sequence continues to limit the study and design of RNA molecules. Recently, machine learning approaches have made considerable advancements in predicting an RNA structure from a sequence. However, these approaches rely heavily on crystal structures of RNA molecules and sequence conservation of homologs, both of which are limited for RNA molecules compared to proteins (Calonaci et al., 2020; Townshend et al., 2021). In addition, describing an RNA molecule as a single structure can be inaccurate, and regulatory elements such as riboswitches demonstrate the importance of an ensemble of structures for an RNA function. It is unclear that predictions based on individual structures alone will be able to predict the functional effects of mutations with the precision needed for many biotechnical and synthetic biology applications, or to predict disease-associated mutations in RNA molecules (Halvorsen et al., 2010). This suggests that new experimental data types might be important for understanding, designing, and manipulating the transcriptome. Self-cleaving ribozymes provide a useful model to study sequence-structure-function relationships in RNA molecules. Self-cleaving ribozymes are catalytic RNA molecules that cleave their own phosphodiester backbone. They were first discovered in viruses and viroids, but numerous families of self-cleaving ribozymes have since been discovered in all domains of life (Prody et al., 1986). The CPEB3 ribozyme, for example, was discovered in the human genome and found to be highly conserved in mammals (Bendixsen et al., 2021; Salehi-Ashtiani et al., 2006). Other self-cleaving ribozymes, such as the hammerhead and twister ribozymes, are found broadly distributed across eukaryotic and prokaryotic genomes (Perreault et al., 2011; Roth et al., 2014). The biological roles of ribozymes in different genomes and different genetic contexts remain an active area of investigation (Jimenez et al., 2015). In addition to being widespread across the tree of life, self-cleaving ribozymes have also been used for several bioengineering applications (Liang et al., 2011; Peng et al., 2021; Wei and Smolke, 2015; Zhong et al., 2016). For example, self-cleaving ribozymes are being combined with aptamers to develop synthetic gene regulatory devices, which have biotechnical and biomedical applications where ligand-dependent control of gene expression is desired (Kobori et al., 2017; Kobori et al., 2015; Stifel et al., 2019; Townshend et al., 2015). The testing of mutational effects in ribozyme sequences has been accelerated by high-throughput experimental approaches. Most self-cleaving ribozymes are fairly small (<200 nt), and genetic variants can be made by chemical synthesis of a single DNA oligonucleotide that is then used as a template for in vitro transcription. The self-cleavage activity of the ribozyme requires a precise three-dimensional structure, and therefore activity can be used as a sensitive indirect readout of native structure. Mutations that disrupt the native structure are detected as reduced activity compared to the unmutated ‘wild-type’ ribozyme. Several methods have been developed to enable the detection of ribozyme function by high-throughput sequencing of biochemical reactions (Bendixsen et al., 2019; Hayden, 2016; Kobori and Yokobayashi, 2016; Shen et al., 2021). For self-cleaving ribozymes, each read from the data reports both the mutations and whether or not that molecule was reacted (cleaved) or unreacted (uncleaved). Therefore, high-throughput sequencing allows numerous genetic variants to be pooled together and still observed hundreds to thousands of times in the data. This provides confidence in the fraction cleaved (FC) for each genetic variant in a given experiment, and genetic variants are compared to determine relative activity (RA). Importantly, the data are internally controlled because both reacted and unreacted molecules are observed, which controls for differences in their abundance due to synthesis steps (chemical DNA synthesis, transcription, reverse-transcription, and PCR). A common approach to confirm structural interactions in RNA and proteins is through analysis of pairs of mutations (Dutheil et al., 2010; Olson et al., 2014). In this context, it can be useful to calculate pairwise epistasis, which measures deviations in the mutational effects of double mutants relative to the effects of each individual mutation (assuming an additive model of mutational effects). For example, in the case of a base pair, each single mutation would disrupt the base-pairing interaction, destabilizing the catalytically active RNA structure and reducing activity. However, if two mutants together restore a base pair, the RA of the double mutant would have much higher activity than expected from the additive effects of the individual mutations (positive epistasis). In contrast to paired nucleotides, double mutants at non-paired nucleotides tend to have a more reduced activity than expected from each individual mutation (negative epistasis) (Bendixsen et al., 2017; Li et al., 2016). In the case of two mutations that create a different base pair (i.e., G-C to A-U), it is known that the stacking with neighboring base pairs is also structurally important, and some base pair substitutions will not be equivalent in a given structural context. This creates a range of possible epistatic effects even for two mutations at paired nucleotide positions. In addition, some non-canonical base interactions within tertiary contacts may also show epistasis even when they do not involve Watson-Crick or GU wobble base-pairing interactions. Nevertheless, the propensity for positive epistasis between physically interacting nucleotides suggests that a comprehensive evaluation of pairwise mutational effects should contain considerable structural information. Here, we report comprehensive analysis of mutational effects for all single and double mutants for five different self-cleaving ribozymes. RA effects of all single and double mutations were determined by high-throughput sequencing of co-transcriptional self-cleavage reactions, and this data was used to calculate epistasis between pairs of mutations. The ribozymes studied include a mammalian CPEB3 ribozyme, a hepatitis delta virus (HDV) ribozyme, a twister ribozyme from Oryza sativa, a hairpin ribozyme derived from the satellite RNA from tobacco ringspot virus, and a hammerhead ribozyme (Bendixsen et al., 2021; Burke and Greathouse, 2005; Chadalavada et al., 2007; Liu et al., 2014; Müller et al., 2012). For each reference ribozyme, a single DNA oligo template library was synthesized with 97% wild-type nucleotides at each position, and 1% of each of the three other nucleotides. This mutagenesis strategy was expected to produce all possible single and double mutants, as well as a random sampling of combinations of three or more mutations. The mutagenized templates were transcribed in vitro, all under identical conditions, where active ribozymes had the opportunity to self-cleave co-transcriptionally. All ribozyme constructs studied cleave near the 5′-end of the RNA, and a template switching reverse transcription protocol was used to append a common primer binding site to both cleaved and uncleaved molecules. Subsequently, low-cycle PCR was used to add indexed Illumina adapters for high-throughput sequencing. Each mutagenized ribozyme template was transcribed separately and in triplicate, and amplified with unique indexes so that all replicates could be pooled and sequenced together on an Illumina sequencer. The sequencing data was then used to count the number of times each unique sequence was observed as cleaved or uncleaved, and this data was used to calculate the FC. The FC of single and double mutants was normalized to the unmutated reference sequence to determine RA. The RA values of the single and double mutants were used to calculate all possible pairwise epistatic interactions in all five ribozymes. We mapped epistasis values to each ribozyme structure to evaluate correlations between structural elements and patterns of pairwise epistasis values. The results indicated that structural features of the ribozymes are revealed in the data, suggesting that these data sets will be useful for developing models for predicting sequence-structure-function relationships in RNA molecules. Results and discussion Epistatic effects in paired nucleotide positions show stability-dependent signatures To evaluate how the effects of mutations mapped to the ribozyme structures, we plotted the RA values as heatmaps, similar to previous publications by others (Andreasson et al., 2020; Kobori and Yokobayashi, 2016; Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, panel A, large plot). We then used this data to calculate epistasis between pairs of mutations. We first inspected nucleotide positions known to be involved in base-paired regions of the secondary structure of each ribozyme. In this heatmap layout, many paired regions showed an anti-diagonal line of high-activity double mutant variants with strong positive epistasis (Figures 1—5, insets, red to blue plots). In addition, pairs of mutations off the anti-diagonal tended to show negative or non-positive epistasis. Pseudoknot elements that involve Watson-Crick base pairs also showed this pattern, including the single base pair T1 element in CPEB3 (Figure 1) and the two base pair T1 element in HDV (Figure 2). The layout of mutations in the heatmap places paired nucleotide positions along the anti-diagonal and compensatory double mutants that change one Watson-Crick base pair to another are found on this anti-diagonal. Individual mutations that break a base pair will often reduce ribozyme activity, but the activity can be restored by a second compensatory mutation resulting in positive epistasis. In contrast, double mutants off-diagonal usually disrupt two base pairs (unless they result in a GU wobble base pair). It is expected that breaking two base pairs in the same paired region would be more deleterious to ribozyme activity than breaking one base pair. The epistasis data indicates that two non-compensatory mutations in the same paired region are more deleterious than expected from an additive assumption, and frequently create negative epistasis off-diagonal within paired regions. Figure 1 Download asset Open asset Effects of mutations and pairwise epistasis in a CPEB3 ribozyme. (A) Relative activity heatmap depicting all possible pairwise effects of mutations on the cleavage activity of a mammalian CPEB3 ribozyme. Base-paired regions P1, P2, P3, P4, and T1 are highlighted and color coordinated along the axes, and surrounded by black squares within the heatmap. Pairwise epistasis interactions observed for each paired regions are each shown as expanded insets for easy identification of the specific epistatic effects measured for each pair of mutations. Instances of positive epistasis are shaded blue, and negative epistasis is shaded red, with higher color intensity indicating a greater magnitude of epistasis. Catalytic residues are indicated by stars along the axes (A is reproduced from Figure 1B from Beck et al., 2022). (B) Secondary structure of the CPEB3 ribozyme used in this study. Each nucleotide is shaded to indicate the average relative cleavage activity of all single mutations at that position. (C) Distributions of epistasis values in the paired regions of the CPEB3 ribozyme. Data were categorized as double mutations that result in two mismatches (2 Mismatch), a single mismatch (1 Mismatch), or no mismatches because of a new Watson-Crick base pair or GU wobble results (WC/GU). Figure 2 Download asset Open asset Effects of mutations and pairwise epistasis in a HDV self-cleaving ribozyme. (A) Relative activity heatmap depicting all possible pairwise effects of mutations on the cleavage activity of an HDV ribozyme. Base-paired regions P1, P2, P3, P4, and T1 are highlighted and color coordinated along the axes, and surrounded by black squares within the heatmap. Pairwise epistasis interactions observed for each paired regions are each shown as expanded insets for easy identification of the specific epistatic effects measured for each pair of mutations. Instances of positive epistasis are shaded blue, and negative epistasis is shaded red, with higher color intensity indicating a greater magnitude of epistasis. Catalytic residues are indicated by stars along the axes. (B) Secondary structure of the HDV ribozyme used in this study. Each nucleotide is shaded to indicate the average relative cleavage activity of all single mutations at that position. (C) Distributions of epistasis values in the paired regions of the HDV ribozyme. Data were categorized as double mutations that result in two mismatches (2 Mismatch), a single mismatch (1 Mismatch), or no mismatches because of a new Watson-Crick base pair or GU wobble results (WC/GU). HDV, hepatitis delta virus. Figure 3 Download asset Open asset Effects of mutations and pairwise epistasis in a twister self-cleaving ribozyme. (A) Relative activity heatmap depicting all possible pairwise effects of mutations on the cleavage activity of a twister ribozyme. Base-paired regions P2, P4, T1, and T2 are highlighted and color coordinated along the axes, and surrounded by black squares within the heatmap. Pairwise epistasis interactions observed for each paired region are each shown as expanded insets for easy identification of the specific epistatic effects measured for each pair of mutations. Instances of positive epistasis are shaded blue, and negative epistasis is shaded red, with higher color intensity indicating a greater magnitude of epistasis. Catalytic residues are indicated by stars along the axes. (B) Secondary structure of the twister ribozyme used in this study. Each nucleotide is shaded to indicate the average relative cleavage activity of all single mutations at that position. (C) Distributions of epistasis values in the paired regions of the twister ribozyme. Data were categorized as double mutations that result in two mismatches (2 Mismatch), a single mismatch (1 Mismatch), or no mismatches because of a new Watson-Crick base pair or GU wobble results (WC/GU). Figure 4 with 1 supplement see all Download asset Open asset Effects of mutations and pairwise epistasis in a hairpin self-cleaving ribozyme. (A) Relative activity heatmap depicting all possible pairwise effects of mutations on the cleavage activity of a hairpin ribozyme. Base-paired regions P1, P2, and P3 are highlighted and color coordinated along the axes, and surrounded by black squares within the heatmap. Pairwise epistasis interactions observed for each paired region are each shown as expanded insets for easy identification of the specific epistatic effects measured for each pair of mutations. Instances of positive epistasis are shaded blue, and negative epistasis is shaded red, with higher color intensity indicating a greater magnitude of epistasis. Catalytic residues are indicated by stars along the axes. (B) Secondary structure of the hairpin ribozyme used in this study. Each nucleotide is shaded to indicate the average relative cleavage activity of all single mutations at that position. (C) Distributions of epistasis values in the paired regions of the hairpin ribozyme. Data were categorized as double mutations that result in two mismatches (2 Mismatch), a single mismatch (1 Mismatch), or no mismatches because of a new Watson-Crick base pair or GU wobble results (WC/GU). (D) The distributions of epistasis values in all terminal stem loops across all five ribozymes, and epistasis observed within loop A, loop B, and between loop A and loop B in the hairpin ribozyme. Figure 5 with 1 supplement see all Download asset Open asset Effects of mutations and pairwise epistasis in a hammerhead self-cleaving ribozyme. (A) Relative activity heatmap depicting all possible pairwise effects of mutations on the cleavage activity of a hammerhead ribozyme. Base-paired regions, P1 and P2, are highlighted and color coordinated along the axes, and surrounded by black squares within the heatmap. Pairwise epistasis interactions observed for each paired region are each shown as expanded insets for easy identification of the specific epistatic effects measured for each pair of mutations. Instances of positive epistasis are shaded blue, and negative epistasis is shaded red, with higher color intensity indicating a greater magnitude of epistasis. Catalytic residues are indicated by stars along the axes. (B) Secondary structure of the hammerhead ribozyme used in this study. Each nucleotide is shaded to indicate the average relative cleavage activity of all single mutations at that position. (C) Distributions of epistasis values in the paired regions of the hammerhead ribozyme. Data were categorized as double mutations that result in two mismatches (2 Mismatch), a single mismatch (1 Mismatch), or no mismatches because of a new Watson-Crick base pair or GU wobble results (WC/GU). (D) Crystal structure of a hammerhead ribozyme (3ZD5) with C20 and G25 indicated (orange) and hydrogen bonds between the nucleotides shown as yellow dashed lines. To further evaluate epistasis within base-paired regions, we separated epistasis data into three categories based on the number of base pairs that the mutations disrupt. For each ribozyme, we plotted the distribution of epistasis values as violin plots (Figures 1—5, panel C). For all ribozymes, the analysis revealed a clear trend. On average, disrupting two base pairs resulted in negative epistasis (mean of distribution), disrupting one base pair shifted the distribution toward more positive epistasis values, and the highest epistasis values (mean and max) were found for double mutants that result in zero disrupted base pairs because the two mutations together create a new Watson-Crick or GU Wobble pair. This trend was observed for paired regions in every ribozyme, and in all cases the distributions were significantly different (p<0.05–0.001, Mann-Whitney U test). This pattern of epistasis in paired regions demonstrates the potential for identifying base-paired regions in RNA structures using comprehensive double-mutant activity data. To further evaluate the potential of epistasis data to identify base-paired regions, we analyzed the epistasis values for each paired region individually. For this analysis, we separated the epistasis values calculated for double mutants that result in a Watson-Crick base pair (‘on-diagonal’ in heatmaps) from all other double mutants (‘off-diagonal’ in heatmaps) in each paired region (Figure 6). Short-paired regions showed the largest differences in the distributions of epistatic effects for on-diagonal and off-diagonal double mutants, while longer-paired regions showed small differences in these distributions. For example, short-paired regions P3 in CPEB3 and HDV (3 bp), and T1 in the twister (4 bp) showed very large differences in the mean of the distributions. These small regions were highly sensitive to individual mutations, and most pairs of mutations within this region resulted in almost no detectable activity except when they created a different Watson-Crick base pair, leading to the large positive epistasis values (Figures 1—5). In addition, in these short-paired regions, we do not see strong negative epistasis. It appears that the strong deleterious effect of a single mutation in these short regions makes a second mutation no more disruptive to activity, resulting in a mean of the distribution near zero for double mutants off-diagonal. In contrast, the largest paired region (HDV P4, 14 bp) showed a very small difference between the distribution of epistasis values found on-diagonal and off-diagonal. This can be rationalized because losing one base pair was not deleterious to the HDV ribozyme activity under our experimental conditions (Figure 2), and this does not allow for positive epistasis upon a second mutation. Even the loss of two base pairs in P4 was somewhat tolerated, leading to very little negative epistasis for two mutations at unpaired positions. Taken together, the results are consistent with other observations in both RNA and proteins, where it has been observed that the effects of mutations, and their additivity, have been shown to be dependent on the local thermodynamics of the structured region (Kraut et al., 2003; Moody and Bevilacqua, 2003). Figure 6 with 3 supplements see all Download asset Open asset Distributions of epistasis values calculated for individual paired regions in all five ribozymes. For each region, epistasis values were separated into double mutants that restore a Watson-Crick base pair (‘on-diagonal’, blue) and all other double mutants (‘off-diagonal’, gray). The mean of each distribution (µ) is reported and indicated by the dashed line. The p value is the probability that values were drawn from the same distribution by chance (Mann-Whitney U test). To explicitly investigate the influence of thermodynamic stability on mutational effects in the data, we calculated the minimum free energy for each paired region and compared mutational effects. We split each paired region into two separate RNA sequences that contained only the base-paired nucleotides, eliminating loop nucleotides, and used nearest neighbor rules to calculate the minimum free energy of their interaction (NUPACK). This approach neglects thermodynamic contributions from terminal loops, but allowed for a consistent approach to compare internal and terminal paired regions. We found a significant negative correlation between the median deleterious effects of single mutations and the minimum free energy of the paired regions (Figure 6—figure supplement 1). Clearly, though, thermodynamic stability alone does not explain every mutational effect. For example, CPEB3 P1 is more sensitive to mutations than CPEB3 P2 or P4 even though the latter are less stable. This is likely because P1 is immediately adjacent to the site of self-cleavage, while P2 and P4 are not. Overall, this analysis of thermodynamic stability indicates that for RNA’s with unknown structures, more stable structural elements may be harder to identify from epistatic effects alone when there is not a strong deleterious effect of individual mutations. However, it is also possible that more stable elements would show stronger epistasis under different experimental conditions, such as different temperatures or magnesium concentrations (Peri et al., 2022). Catalytic residues do not have any high-activity mutants Self-cleaving ribozymes often utilize a concerted acid-base catalysis mechanism where specific nucleobases act as proton donors (acid) or acceptors (base) (Jimenez et al., 2015), and mutations at these positions abolish activity. Analyzing the effects of individual mutations will not distinguish catalytic nucleotides from structurally important nucleotides. Comprehensive pairwise mutations, on the other hand, can potentially distinguish between catalytic residues that cannot be rescued by a second mutation, and structurally important nucleotides that can be rescued (positive epistasis). The catalytic cytosines of the CPEB3 (C57) and HDV (C75) act as proton donors due to perturbed pKa values (Nakano et al., 2000; Skilandat et al., 2016). For the twister ribozyme (Figure 3), the guanosine at position G39 acts as a general base, and the adenosine at position A1 acts as a general acid (Wilson et al., 2016). The catalytic nucleotides for the Hammerhead ribozyme (Figure 5) are the Guanosines located at positions G25 and G39 (Scott et al., 2013). The hairpin ribozyme (Figure 4) contains catalytic nucleotides at positions G29 and A59 (Wilson et al., 2006). In the RA heatmaps, the columns and rows associated with these nucleotides result in low activity values (Figures 1—5, Figure 6—figure supplement 2). It is important to note that because there is complete coverage of all double mutants in this data set, we can be certain that there are no possible compensatory mutations. These results show how catalytic residues can be identified in the comprehensive pairwise mutagenesis data. Unpaired nucleotides show mutational effects that depend on tertiary structure Ribozymes with mutations to nucleotides found in terminal loops that are not involved in tertiary structure elements showed high RA for most single and double mutants, and essentially no epistasis. This is not surprising if these loops reside on the periphery of the ribozyme and are not involved in structural contacts with other nucleotides. This is the case for L4 of CPEB3 (Figure 1), L4 of HDV (Figure 2), and L1 and L3 of the hairpin ribozyme (Figure 4). Two mutations within these loops do not reduce activity, and mutations in these loops do not rescue other deleterious mutations such as those that break a base pair. The internal loops LA and LB of the hairpin ribozyme are structurally important (Figure 4). Interactions between nucleotides within LB include six non-Watson-Crick base-pairing interactions that are important for the formation of an active ribozyme structure (Figure 4—figure supplement 1). Several non-canonical base-base and sugar-base hydrogen bonds between nucleotides within LA are also important for the formation of the active site (Fedor, 2000; Wilson et al., 2006). Docking between LA and LB is necessary for the formation of a catalytically active ribozyme and is facilitated by a Watson-Crick base pair between nucleotides numbered G1 and C46 in the version of the ribozyme used here (Rupert and Ferré-D’Amaré, 2001). In contrast to terminal loop regions, most single mutations within LA and LB resulted in low self-cleavage activity in our data (Figure 4). In addition, the double mutants within and between loop A and loop B show several instances of strong positive epistasis (Figure 4—figure supplement 1C), and the distributions of epistasis within and between these loops are significantly different than the terminal loops that are not structurally important (Figure 4D). This positive epistasis indicates that many of the important structural contacts can be achieved by other specific pairs of nucleotides. For example, the double mutant G1C and C46G shows strong epistasis suggesting that swapping a C-G base pair for the G-C base pair can restore activity by facilitating docking between the two loops. Several double mutants at positions that form non-canonical interactions in LB show positive epistasis. For example, mutation A41G shows positive epistasis when the interacting nucleotide C65 is mutated to a G or U. The non-canonical A45:A59 interaction shows positive epistasis for several pairs of mutations (A45U A59C, A45C A59C, and A45G A59U). Finally, the non-canonical base pair A47:G57 in LB, shows positive epistasis for the A47U:G57A double mutant. The difference between terminal loops and loops with structural importance highlights how activity-based data can help identify non-canonical structures that are challenging to predict computationally, and that might be difficult to identify by other common approaches, such as chemical probing experiments (Walter et al., 2000). Another example of structurally important unpaired regions can be found in the CUGA uridine turn (U-turn) motif in the hammerhead ribozyme (Figure 5). This CUGA turn forms the catalytic pocket and positions a catalytic cy

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call