The assembly and scaffolding of plant crop genomes facilitate the characterization of genetically diverse cultivated and wild germplasm. The cultivated tomato (Solanum lycopersicum) has been improved through the introgression of genetic material from related wild species, including resistance to pandemic strains of tobacco mosaic virus (TMV) from Solanum peruvianum. Here we applied PacBio HiFi and ONT Nanopore sequencing to develop independent, highly contiguous and complementary assemblies of an inbred TMV-resistant tomato variety. We show specific examples of how HiFi and ONT datasets can complement one another to improve assembly contiguity. We merged the HiFi and ONT assemblies to generate a long-read-only assembly where all 12 chromosomes were represented as 12 contiguous sequences (N50 = 68.5 Mbp). This chromosome scale assembly did not require scaffolding using an orthogonal data type. The merged assembly was validated by chromosome conformation capture data and is highly consistent with previous tomato genome assemblies that made use of genetic maps and Hi-C for scaffolding. Our long-read-only assembly reveals that a complex series of structural variants linked to the TMV resistance gene likely contributed to linkage drag of a 64.1-Mbp region of the S. peruvianum genome during tomato breeding. Through marker studies and ONT-based comprehensive haplotyping we show that this minimal introgression region is present in six cultivated tomato hybrid varieties developed in three commercial breeding programs. Our results suggest that complementary long read technologies can facilitate the rapid generation of near-complete genome sequences.
Read full abstract