Abstract

In its most simple terms, the human genome consists of two distinct fractions of DNA: repetitive and unique sequence. Traditionally, a portion of the unique fraction is thought to comprise the obvious functional constituents of our genome, including exons, introns, and regulatory DNA elements. With the exception of telomeric and centromeric repeat sequences, the functional significance of the vast majority of the repetitive fraction is less clear. Since the early experiments of reassociation kinetics of single-stranded human DNA (Britten and Kohne 1968), various gradations of repetitiveness have always been recognized on the basis of both the copy number and the degree of sequence similarity. The number of repeats range from the prolific (LINES, SINES, asatellite, etc., in the 100,000’s) to the relatively few. By virtue of the fact that multigene families exist, genes themselves may be repetitive in nature. Many of the most well-studied members of gene families (hemoglobins and HOX genes), however, appear to be sufficiently divergent (Ohno 1970) or localize to discrete clusters of tandem arrays (rRNA genes, HLA genes, immunoglobulin gene segments). These are often distinguished based on the sequence divergence of individual members or their clustered position within the human genome. The term ‘‘unique’’ DNA, therefore, is relative, determined largely by what we already know about any given genome. The more our genome becomes sequenced, the more the total amount of ‘‘apparent’’ unique sequence will dwindle, with a concomitant burgeoning of the repeat classes. The basic paradigm regarding the repetitive and unique nature of DNA sequence underlies any effort to sequence a genome. In fact, the reason that any genome can be sequenced and assembled is that there is sufficiently enough unique sequence interdigitated among the repetitive fraction, the repetitive fraction is sufficiently divergent, and/or the repetitive fraction can be distinguished as such. A simple corollary exists among the sequencing community: The fewer and less complicated the repeats, the easier a genome is to sequence. At a recent National Institutes of Health (NIH) meeting entitled, ‘‘Genomic Alterations in Genetic Disease: Mechanisms of Structural Rearrangement,’’ a much more complex picture of the organization of repeat sequences in the human genome emerged. Regions of the genome, conspicuously located within the subtelomeric and pericentromeric portions of chromosomes, which harbor large tracts (50–200 kb) of duplicated genomic segments that exhibit a remarkable degree of sequence similarity (95%–9%) are being identified. Unlike ‘‘traditional’’ repeat elements, these segments appear to carry complete or partial genomic structure of known genes, suggesting that they have recently been transposed from elsewhere in the genome. Therefore, they have the appearance of normal gene-encoding unique DNA, and are not, at first glance, easily distinguished as repetitive sequences. Interestingly, many of these large genomic segments of paralogous (sequence similarity due to duplication) sequence were discovered on either side of the breakpoint clusters of well-known microdeletion/microduplication syndromes, such as Prader–Willi syndrome (PWS) in 15q11–13, Williams syndrome, Smith–Magenis syndrome (SMS) in 17p11.2, and Velocardiofacial (VCFS) syndrome in 22q11.2, which suggests that they may have a role in mediating aberrant recombination associated with instability in these regions. Our own recent estimate from available genomic sequence in GenBank (130.1 Mb) seems to give further credibility to this complexity in our genome. A total of 1.1 Mb of genomic sequence, encompassing 21 different genes, was identified that showed remarkable sequence identity (95%–98%) to other large genomic segments or other sequenced cDNAs mapping to different locations in the genome. Most of these segments were identified among sequences mapping to the pericentromeric regions of chromosomes (2p11, 10p11, 15q11, 16p11, and 22q11), which suggests a hitherto unrecognized property of our genome to duplicate and transpose genomic segments to these regions. At the end of the NIH meeting, two general conclusions were reached regarding these complex repeat regions: (1) These repeat sequences are particularly difficult to resolve both from the perspective of mapping and sequencing; and (2) the sequence and organization of these repeat regions will be critical in understanding the process of genomic instability and disease in these regions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call