Abstract

Existing small noncoding RNA analysis tools are optimized for processing short sequencing reads (17–35 nucleotides) to monitor microRNA expression. However, these strategies under-represent many biologically relevant classes of small noncoding RNAs in the 36–200 nucleotides length range (tRNAs, snoRNAs, etc.). To address this, we developed DANSR, a tool for the detection of annotated and novel small RNAs using sequencing reads with variable lengths (ranging from 17–200 nt). While DANSR is broadly applicable to any small RNA dataset, we applied it to a cohort of matched normal, primary, and distant metastatic colorectal cancer specimens to demonstrate its ability to quantify annotated small RNAs, discover novel genes, and calculate differential expression. DANSR is available as an open source tool.

Highlights

  • A diverse range of small noncoding RNA species have been shown to contribute to human development and diseases [1,2,3]

  • We developed a new tool named DANSR, which can be broadly applied to large-scale sequencing data with variable read lengths to discover and quantify different classes of small noncoding RNAs

  • (4) A heuristic algorithm is applied to optimize the boundaries of each read cluster to reconstruct the small RNA (Figure 1B). (5) A network is built based on uniquely aligned and multi-mapped reads to identify single-node read clusters and multi-node read clusters (Figure 1C), which are used in the step to identify low quality read clusters caused by repetitive reads

Read more

Summary

Introduction

A diverse range of small noncoding RNA (small ncRNA, 17–200 nucleotides) species have been shown to contribute to human development and diseases [1,2,3]. Many existing large-scale research efforts, such as The Cancer Genome Atlas (TCGA), have generated data sets that enrich for small RNAs with lengths less than nucleotides (nt) [4], which resulted in many small RNA species greater than nt being under-represented While this led to the development of tools that dramatically advanced the microRNA field [5,6,7], much remains to be understood about the biology of RNA species within the 36–200 nt range (Figure S1). A few tools capable of processing such data struggle with false positives and accurate classification of small RNA transcripts (Table S1) To overcome these limitations, we developed a new tool named DANSR, which can be broadly applied to large-scale sequencing data with variable read lengths to discover and quantify different classes of small noncoding RNAs

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.