Abstract

High-throughput sequencing is subject to sequence dependent bias, which must be accounted for if researchers are to make precise measurements and draw accurate conclusions from their data. A widely studied source of bias in sequencing is the GC content bias, in which levels of GC content in a genomic region effect the number of reads produced during sequencing. Although some research has been performed on methods to correct for GC bias, there has been little effort to understand the underlying mechanism. The availability of sequencing protocols that target the specific location of structure in nucleic acid molecules enables us to investigate the underlying molecular origin of observed GC bias in sequencing. By applying a parallel analysis of RNA structure (PARS) protocol to bacterial genomes of varying GC content, we are able to observe the relationship between local RNA secondary structure and sequencing outcome, and to establish RNA secondary structure as the significant contributing factor to observed GC bias.

Highlights

  • Single-stranded RNA molecules are known to fold into complex three dimensional structures that vary depending on the molecular sequence [1]

  • Many methods have been developed to predict the folded conformations of RNA molecules, and several computational methods have become popular in recent years, such as MFold [4] and Vienna RNA [5], which make predictions of RNA folding conformations based on free energy calculations

  • In order to test one of the main hypotheses of this paper, that secondary structure is a possible cause of bias in the measurement of gene expression via high throughput sequencing, we first performed correlation testing

Read more

Summary

Introduction

Single-stranded RNA molecules are known to fold into complex three dimensional structures that vary depending on the molecular sequence [1]. An experimental method for detecting secondary structure across the entire transcriptome, called PARS, has been developed recently [6]. These technologies and methods make it possible to further investigate the role and effects of secondary structure. We investigate the effect RNA secondary structure has on gene expression data that is generated through modern sequencing technologies. It has been previously shown that there is a detectable dependence of read depth on GC content [7]. This effect has been observed in PLOS ONE | DOI:10.1371/journal.pone.0173023. This effect has been observed in PLOS ONE | DOI:10.1371/journal.pone.0173023 February 28, 2017

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call