Abstract

BackgroundTranscriptome analysis is increasingly being used to study the evolutionary origins and ecology of non-model plants. One issue for both transcriptome assembly and differential gene expression analyses is the common occurrence in plants of hybridisation and whole genome duplication (WGD) and hybridization resulting in allopolyploidy. The divergence of duplicated genes following WGD creates near identical homeologues that can be problematic for de novo assembly and also reference based assembly protocols that use short reads (35 - 100 bp).ResultsHere we report a successful strategy for the assembly of two transcriptomes made using 75 bp Illumina reads from Pachycladon fastigiatum and Pachycladon cheesemanii. Both are allopolyploid plant species (2n = 20) that originated in the New Zealand Alps about 0.8 million years ago. In a systematic analysis of 19 different coverage cutoffs and 20 different k-mer sizes we showed that i) none of the genes could be assembled across all of the parameter space ii) assembly of each gene required an optimal set of parameter values and iii) these parameter values could be explained in part by different gene expression levels and different degrees of similarity between genes.ConclusionsTo obtain optimal transcriptome assemblies for allopolyploid plants, k-mer size and k-mer coverage need to be considered simultaneously across a broad parameter space. This is important for assembling a maximum number of full length ESTs and for avoiding chimeric assemblies of homeologous and paralogous gene copies.

Highlights

  • Transcriptome analysis is increasingly being used to study the evolutionary origins and ecology of non-model plants

  • Quality assessment of the reads and de novo assembly Two lanes of paired-end and one lane of single-end Illumina 75 base pair sequences were generated for P. fastigiatum and one lane of single-end 75 base pair sequences for P. cheesemanii

  • All 75,175,754 reads of P. fastigiatum and 19,191,203 reads of P. cheesemanii were trimmed to retain the longest contiguous read segment where all nucleotides had a Phred quality score above the cutoff of 20, which is equivalent to one base call error every 100 nucleotides

Read more

Summary

Introduction

Transcriptome analysis is increasingly being used to study the evolutionary origins and ecology of non-model plants. One issue for both transcriptome assembly and differential gene expression analyses is the common occurrence in plants of hybridisation and whole genome duplication (WGD) and hybridization resulting in allopolyploidy. For many plant species a close reference does not exist at all, which makes the assembly even more challenging. In these cases, an assessment of optimal assembly parameters is needed to generate full length ESTs and avoid the production of chimeric sequences formed between homeologous copies, recently duplicated, and very similar genes

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call