Abstract
It is only recently, with the advent of long-read sequencing technologies, that we are beginning to uncover previously uncharted regions of complex and inherently recursive plant genomes. To comprehensively study and exploit the genome of the neglected oilseed Brassica nigra, we generated two high-quality nanopore de novo genome assemblies. The N50 contig lengths for the two assemblies were 17.1 Mb (12 contigs), one of the best among 324 sequenced plant genomes, and 0.29 Mb (424 contigs), respectively, reflecting recent improvements in the technology. Comparison with a de novo short-read assembly corroborated genome integrity and quantified sequence-related error rates (0.2%). The contiguity and coverage allowed unprecedented access to low-complexity regions of the genome. Pericentromeric regions and coincidence of hypomethylation enabled localization of active centromeres and identified centromere-associated ALE family retro-elements that appear to have proliferated through relatively recent nested transposition events (<1 Ma). Genomic distances calculated based on synteny relationships were used to define a post-triplication Brassica-specific ancestral genome, and to calculate the extensive rearrangements that define the evolutionary distance separating B. nigra from its diploid relatives.
Highlights
Decoding complete genome information is vital for understanding genome structure, providing a full complement of both the genic and repeat repertoire and uncovering structural variation
Centromeres are of particular interest due to their biological importance, yet resolving their structure has been frustrated by the prevalence of repetitive elements; commonly these are marked by the presence of short, tandemly repeated sequences and, similar to one other very small plant genome[6], no such sequence has been identified for Brassica nigra[7,8]
Recent advancements and cost reductions in LR sequencing technologies are facilitating the generation of high-quality genome assemblies, even for species that have evolved through recursive whole-genome duplication (WGD) events[43]
Summary
Decoding complete genome information is vital for understanding genome structure, providing a full complement of both the genic and repeat repertoire and uncovering structural variation. Recent advances in long-read (LR) sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore Technology (ONT)[9], combined with genome scaffolding methods such as optical mapping and chromosome conformation capture (Hi-C), have led to a paradigm shift in our ability to obtain complete and contiguous genome assemblies[9,10,11]. Both approaches can produce remarkably long reads, the error rate is markedly higher than more accurate Illumina short reads, which until recently limited their use to scaffolding in improving assembly contiguity[12]. Computationally defined genomic distances between the three Brassica diploid genomes allowed the construction of an ancestral Brassica-specific genome
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.