Abstract
The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the β-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.
Highlights
Unlike the assembly of the human X chromosome[13], we took advantage of both ultra-long Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) high-fidelity (HiFi) data to resolve the gaps in human chromosome 8 (Fig. 1a, b, Methods)
Most of the additions reside within distinct chromosomal regions: a 644-kb copy number polymorphic β-defensin gene cluster that maps to chromosome 8p23.1 (Fig. 1c, d); the complete centromere corresponding to 2.08 Mb of α-satellite HORs (Fig. 2); an 863-kb 8q21.2 variable number tandem repeat (VNTR) (Extended Data Fig. 1); and both telomeric regions that end with the canonical TTAGGG repeat sequence (Extended Data Fig. 2)
Chromosome 8 is the first human autosome to be sequenced and assembled from telomere to telomere and contains only the third completed human centromere[13,28], to our knowledge. Both chromosome 8 and X centromeres (Supplementary Fig. 7) contain a pocket of hypomethylation, and we show that this region is enriched for the centromeric histone CENP-A, consistent with the functional kinetochore-binding site[29,30]
Summary
Applying our assembly approach to ONT and HiFi data available for a diploid human genome (HG00733) (Supplementary Table 3, Methods) generates two additional chromosome 8 centromere haplotypes, replicating the overall organization with only subtle differences in the overall length of HOR arrays (Extended Data Fig. 7, Supplementary Table 4). The third layer is completely composed of HORs. The p and q regions are 92 and 149 kb in length, respectively, and share more than 96% sequence identity with each other (Fig. 2a, arrow 3) but less than that with the rest of the centromere. The p and q regions are 92 and 149 kb in length, respectively, and share more than 96% sequence identity with each other (Fig. 2a, arrow 3) but less than that with the rest of the centromere This layer consists largely of homogenous 11-monomer HORs and defines the transition from unmethylated to methylated DNA. Chimpanzee p Monomeric 1.69 Mb α-satellite HOR array centromere (H1) Evolutionary layers
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.