Additive multiple k-mer transcriptome of the keelworm Pomatoceros lamarckii (Annelida; Serpulidae) reveals annelid trochophore transcription factor cassette

Nathan J Kenny,Sebastian M Shimeld

doi:10.1007/s00427-012-0416-6

Abstract

Recent advances in both next-generation sequencing and assembly programmes have made the low-cost construction of transcriptome datasets for non-model species feasible, capable of yielding a raft of information even from less well-transcribed genes. Here we present the results of assemblies performed on a 51-bp paired end Illumina dataset derived from a mixed larval sample of the annelid Pomatoceros lamarckii at 24, 48 and 72h post-fertilization. We used Oases to assemble 36.5 million paired end reads with k-mer sizes from 21 to 29, followed by amalgamation of assemblies, redundancy removal with Vmatch and TGICL and removal of contigs less than 500bp in length. This resulted in a final assembly of 50,151 contigs, with a mean length of 1,221bp and covering 61.3Mbp. A total of 34,846 (69.4%) of these returned a BlastX hit above a cutoff of 1.0e (-3), and 17,967 (35.8%) were assigned at least one GO annotation using Blast2GO. We used the assembly to identify genes belonging to the homeobox superclass and the Fox, Sox and Tbx classes, recovering 37, 16, four and three genes, respectively. This included orthologues of genes previously unidentified in lophotrochozoans and protostomes. Our study illustrates the utility of such transcriptomic assembly methods as a gene discovery tool and greatly expands our knowledge of transcription factor genes in annelids in general and in this species in particular.

Full Text