Abstract
The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%–5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.
Highlights
Alpha-satellite is the only functional DNA sequence associated with all naturally occurring human centromeres
A large fraction is arranged into higher-order repeat (HOR) arrays where alphasatellite monomers are organized as multimeric repeat units ranging in size from 3–5 Mb [1]
Centromeric DNA has been described as the last frontier of genomic sequencing; such regions are typically poorly assembled during the whole-genome shotgun sequence assembly process due to their repetitive complexity
Summary
Alpha-satellite is the only functional DNA sequence associated with all naturally occurring human centromeres. A large fraction is arranged into higher-order repeat (HOR) arrays ( known as chromosome-specific arrays) where alphasatellite monomers are organized as multimeric repeat units ranging in size from 3–5 Mb [1]. While individual human alpha satellite monomer units show 20%–40% single-nucleotide variation, the sequence divergence between higher-order repeat units is typically less than 2% [2,3] (Figure 1). Unequal crossover of satellite DNA between sister chromatid pairs or between homologous chromosomes during meiosis is largely responsible for copy-number differences and is thought to be fundamental in the evolution of these HOR arrays. The organization and unit of periodicity of these arrays are specific to each human chromosome [4,5], with the individual monomer units classified into one of five different suprafamilies based on their sequence properties [5,6]. Studies of closely related primates, such as the chimpanzee and orangutan [2,7] indicate that these particular associations do not persist among the centromeres of homologous chromosome, implying that the structure and content of centromeric DNA changes very quickly over relatively short periods of evolutionary time
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.