Abstract

Engineered RNA elements are programmable tools capable of detecting small molecules, proteins, and nucleic acids. Predicting the behavior of these synthetic biology components remains a challenge, a situation that could be addressed through enhanced pattern recognition from deep learning. Here, we investigate Deep Neural Networks (DNN) to predict toehold switch function as a canonical riboswitch model in synthetic biology. To facilitate DNN training, we synthesize and characterize in vivo a dataset of 91,534 toehold switches spanning 23 viral genomes and 906 human transcription factors. DNNs trained on nucleotide sequences outperform (R2 = 0.43–0.70) previous state-of-the-art thermodynamic and kinetic models (R2 = 0.04–0.15) and allow for human-understandable attention-visualizations (VIS4Map) to identify success and failure modes. This work shows that deep learning approaches can be used for functionality predictions and insight generation in RNA synthetic biology.

Highlights

  • Engineered ribonucleic acid (RNA) elements are programmable tools capable of detecting small molecules, proteins, and nucleic acids

  • A fundamental hurdle in applying deep-learning techniques to RNA synthetic biology systems is the limited size of currently published datasets, which are notably smaller than typical dataset sizes required for the training of deep network architectures in other fields[10,18,19,20,21]

  • The two libraries were sorted on a fluorescence-activated cell sorter (FACS) using four bins (Fig. 1 and Supplementary Figs. 1d, e, 2a), and the toehold-switch variants contained in each bin were quantified using next-generation sequencing (NGS) to recover their individual fluorescence distributions from raw read counts (Fig. 1)

Read more

Summary

Introduction

Engineered RNA elements are programmable tools capable of detecting small molecules, proteins, and nucleic acids. Toehold switches are a class of versatile prokaryotic riboregulators inducible by the presence of a fully programmable trans-RNA trigger sequence[2,3,4,5,6,15,16] These RNA synthetic biology modules have displayed impressive dynamic range and orthogonality when used both in vivo as genetic circuit components[2,5,6], and in vitro as nucleic acid diagnostic tools utilizing cell-free protein synthesis (CFPS) systems[3,4,15,16]. We enhance the transparency of our deep-learning approach by utilizing a nucleotide complementarity matrix input representation to visualize important learned secondary-structure patterns in selected models This attention-visualization technique, which we term VIS4Map (Visualizing Secondary Structure Saliency Maps), allows us to identify RNA module success and failure modes by discovering secondary structures that our deep-learning model uses to accurately predict toehold-switch function. The resulting dataset, models, and visualization analysis (Fig. 1) represent a substantial step forward for the validation and interpretability of high-throughput approaches to designing RNA synthetic biology tools, surpassing the limits of current mechanistic RNA secondarystructure modeling

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call