Abstract

BackgroundNext-generation sequencing technologies allow researchers to obtain millions of sequence reads in a single experiment. One important use of the technology is the sequencing of small non-coding regulatory RNAs and the identification of the genomic locales from which they originate. Currently, there is a paucity of methods for finding small RNA generative locales.ResultsWe describe and implement an algorithm that can determine small RNA generative locales from high-throughput sequencing data. The algorithm creates a network, or graph, of the small RNAs by creating links between them depending on their proximity on the target genome. For each of the sub-networks in the resulting graph the clustering coefficient, a measure of the interconnectedness of the subnetwork, is used to identify the generative locales. We test the algorithm over a wide range of parameters using RFAM sequences as positive controls and demonstrate that the algorithm has good sensitivity and specificity in a range of Arabidopsis and mouse small RNA sequence sets and that the locales it generates are robust to differences in the choice of parameters.ConclusionsNiBLS is a fast, reliable and sensitive method for determining small RNA locales in high-throughput sequence data that is generally applicable to all classes of small RNA.

Highlights

  • Next-generation sequencing technologies allow researchers to obtain millions of sequence reads in a single experiment

  • An edge e exists between two small regulatory RNAs (sRNAs) if the overlap is less than the minimum inclusion distance M, that is e = {sc1i1 j1, sc2i2 j2 }

  • For each connected set of sRNAs the clustering coefficient g as defined by Watts and Strogatz [16] is the average of the ratio of the number of edges that exist between the neighbours of each vertex in the component and the number that could possibly exist

Read more

Summary

Introduction

Next-generation sequencing technologies allow researchers to obtain millions of sequence reads in a single experiment. High-throughput sequencing technologies such as Illumina’s Solexa, 454 Life Sciences’ GS-FLX and ABI’s SOLiD platforms allow researchers to generate gigabases of sequence data in a matter of hours [1]. As such they are finding use in the analysis of many biological datasets, including the deep sequencing and cataloguing of non-coding small regulatory RNAs (sRNAs). These sRNAs have been described as the ‘dark matter of genetics’ [2] because they are highly abundant yet difficult to detect. Grouping the reads into locales that represent the place of origin of potential functional sRNAs is the step

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call