Abstract

Next generation DNA sequencing technologies have led to an exponential growth in the number of genomes being sequenced. While generating whole genome shotgun (WGS) assemblies using next generation sequencing (NGS) is relatively fast and inexpensive, the application of this approach to the assembly of highly repetitive and complex genomes such as wheat results in inferior assemblies thus slowing efforts in identifying markers for crop improvement.The wheat genome is large, highly repetitive and polyploid. Several approaches have been used to sequence and assemble the wheat genome to variable success. Published approaches such as whole chromosome shotgun (WCS) and whole genome shotgun (WGS) have resulted in draft assemblies that are incomplete, fragmented or only represent a subset of the targeted genomic region. BAC by BAC approaches offer the most accurate assemblies although BAC by BAC approaches are expensive and labour intensive. This thesis presents the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. The approach is demonstrated by sequencing and assembling BAC cloned fragments from bread wheat chromosome arm 7DS. This approach enables the generation of accurate scalable and reproducible assemblies cost effectively compared to traditional BAC by BAC approaches. Rigorous assembly validation prior to gene annotation and onward analysis is critical in genome sequencing projects but often missing. This thesis demonstrates rigorous assembly validation of bread wheat chromosome arm 7DS BAC assemblies using multiple independent platforms. Novel approaches for de novo assembly validation are also presented. The BAC assemblies were successfully validated using BAC end sequences (BES), mate pair read mappings and orientations, BAC minimum tilling path (MTP), and a chromosome arm 7DS BioNano optical map. Annotation of chromosome arm 7DS BAC assemblies revealed 5,063 more genes calls when compared to a 7DS chromosome arm shotgun assembly. The chromosome arm 7DS BAC assemblies were less fragmented as reflected in the lengths of scaffolds, isoforms and genes called compared to the 7DS whole chromosome arm shotgun assembly. A depletion in specific core gene families with domains containing amino acid repeats such as coiled-coil proteins and golgins was observed in the 7DS whole chromosome arm shotgun assemblies when compared to the 7DS whole chromosome arm BAC by BAC assembly.This thesis has demonstrated that accurate genome assemblies of large complex polyploid genomes are possible using Illumina short reads despite repeats posing a challenge. This thesis similarly has demonstrated that the choice of an assembly approach not only affects the contiguity and accuracy of an assembly, but also the number and quality of structural and functional gene annotations and also impacts on the resultant gene families.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call