Abstract

BackgroundTranscriptome sequencing and assembly represent a great resource for the study of non-model species, and many metrics have been used to evaluate and compare these assemblies. Unfortunately, it is still unclear which of these metrics accurately reflect assembly quality.ResultsWe simulated sequencing transcripts of Drosophila melanogaster. By assembling these simulated reads using both a “perfect” and a modern transcriptome assembler while varying read length and sequencing depth, we evaluated quality metrics to determine whether they 1) revealed perfect assemblies to be of higher quality, and 2) revealed perfect assemblies to be more complete as data quantity increased.Several commonly used metrics were not consistent with these expectations, including average contig coverage and length, though they became consistent when singletons were included in the analysis. We found several annotation-based metrics to be consistent and informative, including contig reciprocal best hit count and contig unique annotation count. Finally, we evaluated a number of novel metrics such as reverse annotation count, contig collapse factor, and the ortholog hit ratio, discovering that each assess assembly quality in unique ways.ConclusionsAlthough much attention has been given to transcriptome assembly, little research has focused on determining how best to evaluate assemblies, particularly in light of the variety of options available for read length and sequencing depth. Our results provide an important review of these metrics and give researchers tools to produce the highest quality transcriptome assemblies.

Highlights

  • Transcriptome sequencing and assembly represent a great resource for the study of non-model species, and many metrics have been used to evaluate and compare these assemblies

  • Evidence suggests that true gene expression distributions are complex [21]; it has been recognized for some time that a power-law distribution with exponent -1 provides an approximation [20]

  • By comparing D. melanogaster unigene Ortholog Hit Ratio (OHR) against both B. mori and D. melanogaster protein sets we discovered that ortholog hit ratios computed against a related species are generally conservative in estimating individual transcript assembly

Read more

Summary

Introduction

Transcriptome sequencing and assembly represent a great resource for the study of non-model species, and many metrics have been used to evaluate and compare these assemblies. It is still unclear which of these metrics accurately reflect assembly quality. Sequencing of Expressed Sequence Tags (ESTs) can quickly and cheaply provide sequence data for a large percentage of expressed transcripts, which can be assembled into longer transcriptrepresentative sequences. Comparisons suggest that assemblers capable of accounting for alternative splicing perform best [8,13,16,17,18]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.