Abstract

BackgroundTandemly Repeated DNA represents a large portion of the human genome, and accounts for a significant amount of copy number variation. Here we present a genome wide analysis of the largest tandem repeats found in the human genome sequence.ResultsUsing Tandem Repeats Finder (TRF), tandem repeat arrays greater than 10 kb in total size were identified, and classified into simple sequence e.g. GAATG, classical satellites e.g. alpha satellite DNA, and locus specific VNTR arrays. Analysis of these large sequenced regions revealed that several "simple sequence" arrays actually showed complex domain and/or higher order repeat organization. Using additional methods, we further identified a total of 96 additional arrays with tandem repeat units greater than 2 kb (the detection limit of TRF), 53 of which contained genes or repeated exons. The overall size of an array of tandem 12 kb repeats which spanned a gap on chromosome 8 was found to be 600 kb to 1.7 Mbp in size, representing one of the largest non-centromeric arrays characterized. Several novel megasatellite tandem DNA families were observed that are characterized by repeating patterns of interspersed transposable elements that have expanded presumably by unequal crossing over. One of these families is found on 11 different chromosomes in >25 arrays, and represents one of the largest most widespread megasatellite DNA families.ConclusionThis study represents the most comprehensive genome wide analysis of large tandem repeats in the human genome, and will serve as an important resource towards understanding the organization and copy number variation of these complex DNA families.

Highlights

  • Repeated DNA represents a large portion of the human genome, and accounts for a significant amount of copy number variation

  • Repeated DNA is organized as multiple copies of a homologous DNA sequence of a certain size that are arranged in a head to tail pattern to form tandem arrays, and represent a distinct type of sequence organization shared by all sequenced genomes

  • Classical satellite repetitive DNA in the human genome We performed a bioinformatics analysis of the tandemly repeated DNA using the output from tandem repeats finder (TRF) run against hg18 [6]http://tandem.bu.edu/ cgi-bin/trdb/trdb.exe, which reports 947,696 arrays containing tandem repeats ranging in size from 2 to 2000 bp [7]

Read more

Summary

Introduction

Repeated DNA represents a large portion of the human genome, and accounts for a significant amount of copy number variation. Repeated DNA makes up a significant portion of the human genome. Tandem repeats have been shown to play a role in paramutation in Maize [2,3] and FWA gene regulation in Arabidopsis [4]. Overall, many of these functions appear to involve RNA interference- mediated chromatin modifications [5,3]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call