The publication of the human genome draft sequence provides, for the first time, a global view of the structural properties of the human genome. Initial sequence analysis, in combination with previous published reports, reveals that more than half of the transition regions between euchromatin and centromeric heterochromatin contain duplicated segments. The individual duplications originate from diverse euchromatic regions of the human genome, often containing intron-exon structure of known genes. Multiple duplicons are concatenated together to form larger blocks of wall-to-wall duplications. For a single chromosome, these paralogous segments can span >1 Mb of sequence and define a buffer zone between unique sequence and tandemly repeated satellite sequences. Unusual pericentromeric interspersed repeat elements have been identified at the junctions of many of these duplications. Phylogenetic and comparative studies of pericentromeric sequences suggest that this peculiar genome organization has emerged within the last 30 million years of human evolution and is a source of considerable genomic variation between closely related primate species. Interestingly, not all human pericentromeric regions show this proclivity to duplicate and transpose genomic sequence, suggesting at least two different models for the organization of these regions.
Read full abstract