Abstract

BackgroundAlternative splicing allows the pre-mRNAs of a gene to be spliced into various mRNAs, which greatly increases the diversity of proteins. High-throughput sequencing of mRNAs has revolutionized our ability for transcripts reconstruction. However, the massive size of short reads makes de novo transcripts assembly an algorithmic challenge.ResultsWe develop a novel radical framework, called DTA-SiST, for de novo transcriptome assembly based on suffix trees. DTA-SiST first extends contigs by reads that have the longest overlaps with the contigs’ terminuses. These reads can be found in linear time of the lengths of the reads through a well-designed suffix tree structure. Then, DTA-SiST constructs splicing graphs based on contigs for each gene locus. Finally, DTA-SiST proposes two strategies to extract transcript-representing paths: a depth-first enumeration strategy and a hybrid strategy based on length and coverage. We implemented the above two strategies and compared them with the state-of-the-art de novo assemblers on both simulated and real datasets. Experimental results showed that the depth-first enumeration strategy performs always better with recall and also better with precision for smaller datasets while the hybrid strategy leads with precision for big datasets.ConclusionsDTA-SiST performs more competitive than the other compared de novo assemblers especially with precision measure, due to the read-based contig extension strategy and the elegant transcripts extraction rules.

Highlights

  • Alternative splicing allows the pre-mRNAs of a gene to be spliced into various mRNAs, which greatly increases the diversity of proteins

  • The pre-mRNAs of a gene can be spliced into various mRNAs, which results in the large difference between the number of genes and that of proteins [1, 2]

  • We realized DTA-SiST in two versions: DTA-SiST-E and DTA-SiST-H corresponding to the depth-first enumeration strategy and the hybrid strategy based on length and coverage, respectively

Read more

Summary

Introduction

Alternative splicing allows the pre-mRNAs of a gene to be spliced into various mRNAs, which greatly increases the diversity of proteins. High-throughput sequencing of mRNAs has revolutionized our ability for transcripts reconstruction. Alternative splicing plays an important role in regulating gene expression and producing diversity of proteins. The pre-mRNAs of a gene can be spliced into various mRNAs, which results in the large difference between the number of genes and that of proteins [1, 2]. A transcript is defined as a mature mRNA that encodes protein. We call the set of all the transcripts the transcriptome. The transcriptome can be seen as a precursor of the proteome, i.e., the entire set of proteins expressed by a genome.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call