Abstract

Characterizing genotype-phenotype relationships of biomolecules (e.g. ribozymes) requires accurate ways to measure activity for a large set of molecules. Kinetic measurement using high-throughput sequencing (e.g. k-Seq) is an emerging assay applicable in various domains that potentially scales up measurement throughput to over 106 unique nucleic acid sequences. However, maximizing the return of such assays requires understanding the technical challenges introduced by sequence heterogeneity and DNA sequencing. We characterized the k-Seq method in terms of model identifiability, effects of sequencing error, accuracy and precision using simulated datasets and experimental data from a variant pool constructed from previously identified ribozymes. Relative abundance, kinetic coefficients, and measurement noise were found to affect the measurement of each sequence. We introduced bootstrapping to robustly quantify the uncertainty in estimating model parameters and proposed interpretable metrics to quantify model identifiability. These efforts enabled the rigorous reporting of data quality for individual sequences in k-Seq experiments. Here we present detailed protocols, define critical experimental factors, and identify general guidelines to maximize the number of sequences and their measurement accuracy from k-Seq data. Analogous practices could be applied to improve the rigor of other sequencing-based assays.

Highlights

  • Determining the genotype-phenotype relationships for any large set of biomolecules requires a highthroughput method to measure the chemical activity of each sequence in the set

  • Model identifiability depends on kinetic coefficients, experimental conditions, and measurement error

  • By comparing the distribution of each metric for sequences in the selected regions, both σ) and γ reflected the trend of model identifiability observed by examining individual curves: higher metric value corresponded to less separable parameters of a sequence (Figure S10)

Read more

Summary

Introduction

Determining the genotype-phenotype relationships for any large set of biomolecules requires a highthroughput method to measure the chemical activity of each sequence in the set. Since nucleic acid sequences act as their own ‘barcodes’, use of sequencing as the assay would avoid the need to isolate and test each unique sequence individually Such a method can measure the activity of each sequence in a population of functional molecules, at multiple time points, substrate concentrations, or other variable conditions. HTS-based kinetic measurements have been proposed and demonstrated with nucleic acids, including catalytic DNA [1], catalytic RNA [2,3,4,5], substrate RNA (“HTS-Kin”) [6], RNA aptamers [7], and transcription factors (TF) binding to DNA [8] In these studies, approximately 103 ~ 106 unique sequences are measured, depending on the experimental design.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call