Abstract

Diverse invertebrate taxa including all 200,000 species of Hymenoptera (ants, bees, wasps and sawflies) have a haplodiploid sex determination system, where females are diploid and males are haploid. Thus, hymenopteran genome projects can make use of DNA from a single haploid male sample, which is assumed advantageous for genome assembly. For the purpose of gene annotation, transcriptome sequencing is usually conducted using RNA from a pool of individuals. We conducted a comparative analysis of genome and transcriptome assembly and annotation methods, using genetic sources of different ploidy: (1) DNA from a haploid male or a diploid female (2) RNA from the same haploid male or a pool of individuals. We predicted that the use of a haploid male as opposed to a diploid female will simplify the genome assembly and gene annotation thanks to the lack of heterozygosity. Using DNA and RNA from the same haploid individual is expected to provide better confidence in transcript-to-genome alignment, and improve the annotation of gene structure in terms of the exon/intron boundaries. The haploid genome assemblies proved to be more contiguous, with both contig and scaffold N50 size at least threefold greater than their diploid counterparts. Completeness evaluation showed mixed results. The SOAPdenovo2 diploid assembly was missing more genes than the haploid assembly. The SPAdes diploid assembly had more complete genes, but a higher level of duplicates, and a greatly overestimated genome size. When aligning the two transcriptomes against the male genome, the male transcriptome gave 2–3% more complete transcripts than the pool transcriptome for genes with comparable expression levels in both transcriptomes. However, this advantage disappears in the final results of the gene annotation pipeline that incorporates evidence from homologous proteins. The RNA pool is still required to obtain the full transcriptome with genes that are expressed in other life stages and castes. In conclusion, the use of a haploid source material for a de novo genome project provides a substantial advantage to the quality of the genome draft and the use of RNA from the same haploid individual for transcriptome to genome alignment provides a minor advantage for genes that are expressed in the adult male.

Highlights

  • Whole genome de novo assembly is a crucial component in various types of genetic research

  • The largest haplodiploid animal clade is the Hymenoptera, including more than 200,000 species of ants, bees, wasps, and sawflies. This approach was already put into practice in previous hymenopteran genome projects, such as the leafcutter ant Acromyrmex echinatior[8] and the fire ant Solenopsis invicta[9], which used haploid males as their main source for genome sequencing and assembly, alongside a pool of workers for transcriptome sequencing

  • This study evaluated the utility of haploid samples as the source for both genomic and transcriptomic material in a de novo genome sequencing project

Read more

Summary

Introduction

Whole genome de novo assembly is a crucial component in various types of genetic research. During the de-Bruijn graph walkthrough, the assembler must deal with repetitive elements by resolving alternative or circular paths (“bubbles”) This is often impossible when extending contigs through repetitive sequences longer than the read length[2] and typically results in a highly fragmented assembly, consisting of non-repetitive fragments ending in unresolved repetitive sequences[3]. The largest haplodiploid animal clade is the Hymenoptera, including more than 200,000 species of ants, bees, wasps, and sawflies This approach was already put into practice in previous hymenopteran genome projects, such as the leafcutter ant Acromyrmex echinatior[8] and the fire ant Solenopsis invicta[9], which used haploid males as their main source for genome sequencing and assembly, alongside a pool of workers for transcriptome sequencing. The advantage of using both RNA and DNA from the same male individual was evaluated, with the expectation that this will provide greater confidence in transcript-to-genome alignment, and improve the annotation of gene structures in terms of their exon/ intron boundaries

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call