Abstract

BackgroundDe novo assembly of non-model organism’s transcriptomes has recently been on the rise in concert with the number of de novo transcriptome assembly software programs. There is a knowledge gap as to what assembler software or k-mer strategy is best for construction of an optimal de novo assembly. Additionally, there is a lack of consensus on which evaluation metrics should be used to assess the quality of de novo transcriptome assemblies.ResultSix different assembly strategies were evaluated from four different assemblers. The Trinity assembly was used in its default 25 single k-mer value while Bridger, Oases, and SOAPdenovo-Trans were performed with multiple k-mer strategies. Bridger, Oases, and SOAPdenovo-Trans used a small multiple k-mer (SMK) strategy consisting of the k-mer lengths of 21, 25, 27, 29, 31, and 33. Additionally, Oases and SOAPdenovo-Trans were performed using a large multiple k-mer (LMK) strategy consisting of k-mer lengths of 25, 35, 45, 55, 65, 75, and 85. Eleven metrics were used to evaluate each assembly strategy including three genome related evaluation metrics (contig number, N50 length, Contigs >1 kb, reads) and eight transcriptome evaluation metrics (mapped back to transcripts (RMBT), number of full length transcripts, number of open reading frames, Detonate RSEM-EVAL score, and percent alignment to the southern platyfish, Amazon molly, BUSCO and CEGMA databases). The assembly strategy that performed the best, that is it was within the top three of each evaluation metric, was the Bridger assembly (10 of 11) followed by the Oases SMK assembly (8 of 11), the Oases LMK assembly (6 of 11), the Trinity assembly (4 of 11), the SOAP LMK assembly (4 of 11), and the SOAP SMK assembly (3 of 11).ConclusionThis study provides an in-depth multi k-mer strategy investigation concluding that the assembler itself had a greater impact than k-mer size regardless of the strategy employed. Additionally, the comprehensive performance transcriptome evaluation metrics utilized in this study identified the need for choosing metrics centered on user defined research goals. Based on the evaluation metrics performed, the Bridger assembly was able to construct the best assembly of the testis transcriptome in Fundulus heteroclitus.

Highlights

  • Generation sequencing (NGS) technologies have offered unprecedented opportunities to obtain genetic information for non-model organisms with little or no molecular information available [1]

  • There is a knowledge gap as to what assembler software or k-mer strategy is best for construction of an optimal de novo assembly

  • This study provides an in-depth multi k-mer strategy investigation concluding that the assembler itself had a greater impact than k-mer size regardless of the strategy employed

Read more

Summary

Introduction

Generation sequencing (NGS) technologies have offered unprecedented opportunities to obtain genetic information for non-model organisms with little or no molecular information available [1]. This increasingly accessible technology provides an efficient and cost-effective approach for analyzing the transcriptome of non-model organisms that lack a fully-sequenced genome [2,3,4,5,6]. De novo transcriptome assembly is performed by taking the enormous amount of short read sequences produced by NGS and overlapping them to form contiguous sequences (contigs) [9]. Low k-mer values have a tendency to recover less abundant transcripts, while producing a large amount of contigs, with a number of them highly fragmented due to sequencing errors and lack of overlap [11, 13]. There is a lack of consensus on which evaluation metrics should be used to assess the quality of de novo transcriptome assemblies

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call