Abstract
Diverse invertebrate taxa including all 200,000 species of Hymenoptera (ants, bees, wasps and sawflies) have a haplodiploid sex determination system, where females are diploid and males are haploid. Thus, hymenopteran genome projects can make use of DNA from a single haploid male sample, which is assumed advantageous for genome assembly. For the purpose of gene annotation, transcriptome sequencing is usually conducted using RNA from a pool of individuals. We conducted a comparative analysis of genome and transcriptome assembly and annotation methods, using genetic sources of different ploidy: (1) DNA from a haploid male or a diploid female (2) RNA from the same haploid male or a pool of individuals. We predicted that the use of a haploid male as opposed to a diploid female will simplify the genome assembly and gene annotation thanks to the lack of heterozygosity. Using DNA and RNA from the same haploid individual is expected to provide better confidence in transcript-to-genome alignment, and improve the annotation of gene structure in terms of the exon/intron boundaries. The haploid genome assemblies proved to be more contiguous, with both contig and scaffold N50 size at least threefold greater than their diploid counterparts. Completeness evaluation showed mixed results. The SOAPdenovo2 diploid assembly was missing more genes than the haploid assembly. The SPAdes diploid assembly had more complete genes, but a higher level of duplicates, and a greatly overestimated genome size. When aligning the two transcriptomes against the male genome, the male transcriptome gave 2–3% more complete transcripts than the pool transcriptome for genes with comparable expression levels in both transcriptomes. However, this advantage disappears in the final results of the gene annotation pipeline that incorporates evidence from homologous proteins. The RNA pool is still required to obtain the full transcriptome with genes that are expressed in other life stages and castes. In conclusion, the use of a haploid source material for a de novo genome project provides a substantial advantage to the quality of the genome draft and the use of RNA from the same haploid individual for transcriptome to genome alignment provides a minor advantage for genes that are expressed in the adult male.
Highlights
Whole genome de novo assembly is a crucial component in various types of genetic research
The largest haplodiploid animal clade is the Hymenoptera, including more than 200,000 species of ants, bees, wasps, and sawflies. This approach was already put into practice in previous hymenopteran genome projects, such as the leafcutter ant Acromyrmex echinatior[8] and the fire ant Solenopsis invicta[9], which used haploid males as their main source for genome sequencing and assembly, alongside a pool of workers for transcriptome sequencing
This study evaluated the utility of haploid samples as the source for both genomic and transcriptomic material in a de novo genome sequencing project
Summary
Whole genome de novo assembly is a crucial component in various types of genetic research. During the de-Bruijn graph walkthrough, the assembler must deal with repetitive elements by resolving alternative or circular paths (“bubbles”) This is often impossible when extending contigs through repetitive sequences longer than the read length[2] and typically results in a highly fragmented assembly, consisting of non-repetitive fragments ending in unresolved repetitive sequences[3]. The largest haplodiploid animal clade is the Hymenoptera, including more than 200,000 species of ants, bees, wasps, and sawflies This approach was already put into practice in previous hymenopteran genome projects, such as the leafcutter ant Acromyrmex echinatior[8] and the fire ant Solenopsis invicta[9], which used haploid males as their main source for genome sequencing and assembly, alongside a pool of workers for transcriptome sequencing. The advantage of using both RNA and DNA from the same male individual was evaluated, with the expectation that this will provide greater confidence in transcript-to-genome alignment, and improve the annotation of gene structures in terms of their exon/ intron boundaries
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.