Repeat-rich Regions Research Articles

A long-time observation in genomes from bacteria to humans is that the level of nucleotide diversity varies from region to region within the genome. The sequence at some spots is virtually identical among all individuals in a population, while at other spots, variation abounds. What accounts for this differential variability from place to place within the genome? In this issue of PloS Biology, Michael McDonald, Jun-Yi Leu, and colleagues provide evidence that one prominent hypothesis doesn’t explain all the facts, while another, less popular one does. Their work concerns how insertions and deletions (‘‘indels’’) contribute to the sequence variability within a region, and the relative importance of indels to other factors. In the most widely held model, the ‘‘mutagenic indel’’ hypothesis, a heterozygous indel causes the DNA repair machinery to sprinkle the surrounding region with substitutions (the essence of sequence variability) in the process of attempting to correct the mismatch. A key prediction of the hypothesis is that, because the repair machinery is only called into play when sequences on homologous chromosomes differ, once the indel becomes homozygous in the population (i.e., all individuals have it on both chromosomes), there is nothing left to repair, and the accumulation of substitutions should end. In contrast, the ‘‘regional differences’’ hypothesis posits that substitutions arise because of peculiarities of the local genomic environment, independent of the presence or heterozygosity of indels, and thus should continue to accumulate substitutions whether or not the indel is homozygous in the population, or even present at a particular spot. The authors began their test of the mutagenic indel hypothesis by examining nucleotide diversity in a prokaryote, the gut bacterium E. coli. Prokaryotes are haploid; paired chromosomes exist for only a brief period during the life cycle, severely limiting the opportunity for diploidy-based DNA repair. Since the mutagenic indel hypothesis relies on this event, there should be little opportunity for nucleotide diversity to accumulate over time as it does in diploid eukaryotes. Thus, younger indels should have accumulated just as many substitutions around them as older ones. Instead, the authors found, older E. coli indels were surrounded by many more substitutions, suggesting that, despite the absence of diploidy and its associated repair mechanisms, substitutions continue to accumulate around indels over time. A second test was to compare nucleotide diversity in regions without indels to comparable regions with indels. If indels promote substitutions (and therefore diversity), the region surrounding the indel should be more diverse, and the region without it should be no more diverse than expected from the background rate of sequence change. This comparison is trickier than it sounds, since it requires knowing ahead of time which of two similar sequences contains an indel. The authors proceeded by comparing similar regions in two different bacterial strains, and using the sequence from a third strain to infer the ancestral sequence. Contrary to the mutagenic indel hypothesis, they found that diversity in both sequences was elevated above the background, but that the sequence without the indel was just as diverse as the sequence with the indel. Furthermore, while indels had an acute effect near the time of mutation, that effect diminished over time, while the regional effect persisted. This suggested that the sequence of the region, not the presence of the indel, was controlling the diversity level. And what was true for bacteria was also true for yeast and flies: indels caused a one-time spike in diversity, while the effect of the region was constant. So what characteristics of a region make it prone to accumulation of diversity? Repeat sequences are well known to cause indels, as the replication machinery slips and misaligns the two strands. But the authors propose a different mechanism to account for the substitutions surrounding the repeats. The authors note that two of E. coli’s five DNA polymerases are prone to make errors during copying. Such polymerases are recruited when replication stalls, or when DNA breaks, events often caused by repetitive DNA sequences. Thus, apart from the indel mutations, repetitive DNA should induce multiple nearby substitutions as these polymerases take over. By searching for repeat-rich regions in multiple E. coli genomes, they found that diversity was highest around repeat sequences even when there were no

Read full abstract

BackgroundRepeat-rich regions such as centromeres receive less attention than their gene-rich euchromatic counterparts because the former are difficult to assemble and analyze. Our objectives were to 1) map all ten centromeres onto the maize genetic map and 2) characterize the sequence features of maize centromeres, each of which spans several megabases of highly repetitive DNA. Repetitive sequences can be mapped using special molecular markers that are based on PCR with primers designed from two unique "repeat junctions". Efficient screening of large amounts of maize genome sequence data for repeat junctions, as well as key centromere sequence features required the development of specific annotation software.ResultsWe developed JunctionViewer to automate the process of identifying and differentiating closely related centromere repeats and repeat junctions, and to generate graphical displays of these and other features within centromeric sequences. JunctionViewer generates NCBI BLAST, WU-BLAST, cross_match and MUMmer alignments, and displays the optimal alignments and additional annotation data as concise graphical representations that can be viewed directly through the graphical interface or as PostScript® output.This software enabled us to quickly characterize millions of nucleotides of newly sequenced DNA ranging in size from single reads to assembled BACs and megabase-sized pseudochromosome regions. It expedited the process of generating repeat junction markers that were subsequently used to anchor all 10 centromeres to the maize map. It also enabled us to efficiently identify key features in large genomic regions, providing insight into the arrangement and evolution of maize centromeric DNA.ConclusionsJunctionViewer will be useful to scientists who wish to automatically generate concise graphical summaries of repeat sequences. It is particularly valuable for those needing to efficiently identify unique repeat junctions. The scalability and ability to customize homology search parameters for different classes of closely related repeat sequences make this software ideal for recurring annotation (e.g., genome projects that are in progress) of genomic regions that contain well-defined repeats, such as those in centromeres. Although originally customized for maize centromere sequence, we anticipate this software to facilitate the analysis of centromere and other repeat-rich regions in other organisms.

Read full abstract

Repeat-rich Regions Research Articles

Related Topics

Articles published on Repeat-rich Regions

Unusually effective microRNA targeting within repeat-rich coding regions of mammalian mRNAs

For Genomes, Repetition Breeds Diversity

Efficient algorithms for tandem copy number variation reconstruction in repeat-rich regions

Association between simple sequence repeat-rich chromosome regions and intergenomic translocation breakpoints in natural populations of allopolyploid wild wheats.

Massively parallel sequencing of ataxia genes after array-based enrichment

JunctionViewer: customizable annotation software for repeat-rich genomic regions

Distinct retroelement classes define evolutionary breakpoints demarcating sites of evolutionary novelty

Cancer DNA Methylation: Molecular Mechanisms and Clinical Implications

Identification and characterization of small RNAs from vernalizedArabidopsis thaliana

Global heterochromatic colocalization of transposable elements with minisatellites in the compact genome of the pufferfish Tetraodon nigroviridis

The genome of the Cryptophlebia leucotreta granulovirus

Development of markers for simple sequence repeat-rich regions that discriminate between Pisolithus albus and P. microcarpus

Heterochromatin in interphase nuclei of Arabidopsis thaliana.

Genetic and Physical Mapping of Two Centromere-Proximal Regions of Chromosome IV in Aspergillus nidulans

A BAC-based physical map of the major autosomes of Drosophila melanogaster.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Repeat-rich Regions Research Articles

Related Topics

Articles published on Repeat-rich Regions

Unusually effective microRNA targeting within repeat-rich coding regions of mammalian mRNAs

For Genomes, Repetition Breeds Diversity

Efficient algorithms for tandem copy number variation reconstruction in repeat-rich regions

Association between simple sequence repeat-rich chromosome regions and intergenomic translocation breakpoints in natural populations of allopolyploid wild wheats.

Massively parallel sequencing of ataxia genes after array-based enrichment

JunctionViewer: customizable annotation software for repeat-rich genomic regions

Distinct retroelement classes define evolutionary breakpoints demarcating sites of evolutionary novelty

Cancer DNA Methylation: Molecular Mechanisms and Clinical Implications

Identification and characterization of small RNAs from vernalizedArabidopsis thaliana

Global heterochromatic colocalization of transposable elements with minisatellites in the compact genome of the pufferfish Tetraodon nigroviridis

The genome of the Cryptophlebia leucotreta granulovirus

Development of markers for simple sequence repeat-rich regions that discriminate between Pisolithus albus and P. microcarpus

Heterochromatin in interphase nuclei of Arabidopsis thaliana.

Genetic and Physical Mapping of Two Centromere-Proximal Regions of Chromosome IV in Aspergillus nidulans

A BAC-based physical map of the major autosomes of Drosophila melanogaster.