Abstract

ABSTRACTBackgroundThe performance of RNA sequencing (RNA-seq) aligners and assemblers varies greatly across different organisms and experiments, and often the optimal approach is not known beforehand.ResultsHere, we show that the accuracy of transcript reconstruction can be boosted by combining multiple methods, and we present a novel algorithm to integrate multiple RNA-seq assemblies into a coherent transcript annotation. Our algorithm can remove redundancies and select the best transcript models according to user-specified metrics, while solving common artifacts such as erroneous transcript chimerisms.ConclusionsWe have implemented this method in an open-source Python3 and Cython program, Mikado, available on GitHub.

Highlights

  • The performance of RNA sequencing (RNA-seq) aligners and assemblers varies greatly across different organisms and experiments, and often the optimal approach is not known beforehand

  • In line with the previous RGASP evaluation, we performed our tests on the three metazoan species of Caenhorabditis elegans, Drosophila melanogaster, and Homo sapiens using RNA-seq data from that study as input

  • Transcriptome assembly is a crucial component of genome annotation workflows; correctly reconstructing transcripts from short RNA-seq reads remains a challenging task

Read more

Summary

Introduction

The performance of RNA sequencing (RNA-seq) aligners and assemblers varies greatly across different organisms and experiments, and often the optimal approach is not known beforehand. For many of these species, there are only minimal expressed sequence tag (EST) and cDNA resources and limited availability of proteins from closely related species In these cases, transcriptome data from high-throughput RNA sequencing (RNA-seq) provides a vital source of evidence to aid gene structure annotation. Many approaches developed for this purpose leverage genomic alignments [9,10,11,12], there are alternatives based instead on de novo assembly [10, 13, 14] While these methods focus on how to analyze a single dataset, related research has examined how to integrate assemblies from multiple samples. While some researchers advocate for merging together reads from multiple samples and assembling them jointly [10], others have developed methods to integrate multiple assemblies into a single coherent annotation [9, 15]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call