Abstract
Centromeres are essential chromosomal regions that mediate kinetochore assembly and spindle attachments during cell division. Despite their functional conservation, centromeres are among the most rapidly evolving genomic regions and can shape karyotype evolution and speciation across taxa. Although significant progress has been made in identifying centromere-associated proteins, the highly repetitive centromeres of metazoans have been refractory to DNA sequencing and assembly, leaving large gaps in our understanding of their functional organization and evolution. Here, we identify the sequence composition and organization of the centromeres of Drosophila melanogaster by combining long-read sequencing, chromatin immunoprecipitation for the centromeric histone CENP-A, and high-resolution chromatin fiber imaging. Contrary to previous models that heralded satellite repeats as the major functional components, we demonstrate that functional centromeres form on islands of complex DNA sequences enriched in retroelements that are flanked by large arrays of satellite repeats. Each centromere displays distinct size and arrangement of its DNA elements but is similar in composition overall. We discover that a specific retroelement, G2/Jockey-3, is the most highly enriched sequence in CENP-A chromatin and is the only element shared among all centromeres. G2/Jockey-3 is also associated with CENP-A in the sister species D. simulans, revealing an unexpected conservation despite the reported turnover of centromeric satellite DNA. Our work reveals the DNA sequence identity of the active centromeres of a premier model organism and implicates retroelements as conserved features of centromeric DNA.
Highlights
Centromeres are marked by the histone H3 variant centromere protein A (CENP-A; called centromere identifier [Cid] in Drosophila and centromeric histone H3 [CenH3] in plants), which is necessary and sufficient for kinetochore activity [1, 2]
We took four complementary approaches to discover regions of the genome enriched for CENP-A: (1) identifying simple repeats enriched for CENP-A based on kmers, (2) mapping reads to a comprehensive repeat library to summarize enriched transposable elements (TEs) and complex repeats, (3) using de novo assembly methods to assemble contigs from the chromatin immunoprecipitation (ChIP) reads and calculating enrichment relative to input post hoc, and (4) mapping reads to a heterochromatin-enriched assembly [19] and calling ChIP peaks (Fig 1A)
We found that CENP-A is strongly associated with retroelements, non-long terminal repeat (LTR) long interspersed nuclear element (LINE)-like elements in the Jockey family and with the intergenic spacer of the ribosomal genes (IGS)
Summary
Centromeres are marked by the histone H3 variant centromere protein A (CENP-A; called centromere identifier [Cid] in Drosophila and centromeric histone H3 [CenH3] in plants), which is necessary and sufficient for kinetochore activity [1, 2]. Efforts to determine the structural organization of centromeres in D. melanogaster combined deletion analyses and sequencing of an X-derived minichromosome, Dp1187 These studies mapped the minimal DNA sequences sufficient for centromere function to a 420-kb region containing the AAGAG and AATAT satellites interspersed with “islands” of complex sequences [14, 15]. It is unclear which parts of this minimal region comprise the active centromere, whether it corresponds to the native X chromosome centromere, and if other centromeres have a similar organization. Satellites have been regarded as the major structural elements of Drosophila, humans, and mouse centromeres [2, 3, 17]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.