Abstract

BackgroundCpG islands (CGIs), clusters of CpG dinucleotides in GC-rich regions, are often located in the 5' end of genes and considered gene markers. Hackenberg et al. (2006) recently developed a new algorithm, CpGcluster, which uses a completely different mathematical approach from previous traditional algorithms. Their evaluation suggests that CpGcluster provides a much more efficient approach to detecting functional clusters or islands of CpGs.ResultsWe systematically compared CpGcluster with the traditional algorithm by Takai and Jones (2002). Our comparisons of (1) the number of islands versus the number of genes in a genome, (2) the distribution of islands in different genomic regions, (3) island length, (4) the distance between two neighboring islands, and (5) methylation status suggest that Takai and Jones' algorithm is overall more appropriate for identifying promoter-associated islands of CpGs in vertebrate genomes.ConclusionThe generation of genome sequence and DNA methylation data is expected to accelerate greatly. The information in this study is important for its extensive utility in gene feature analysis and epigenomics including gene prediction and methylation chip design in different genomes.

Highlights

  • CpG islands (CGIs), clusters of CpG dinucleotides in GC-rich regions, are often located in the 5' end of genes and considered gene markers

  • CGIs versus CpGcluster has a slightly better sensitivity (CGCs): statistics in the human and mouse genomes Table 1 shows the statistics of CGIs by Takai and Jones' algorithm and CGCs by CpGcluster in the human and mouse genomes

  • CGIs are much longer than CGCs

Read more

Summary

Introduction

CpG islands (CGIs), clusters of CpG dinucleotides in GC-rich regions, are often located in the 5' end of genes and considered gene markers. Traditional algorithms are based on three sequence parameters (length, GC content, and ratio of the observed over the expected CpGs (ObsCpG/ExpCpG)) that were originally proposed by Gardiner-Garden and Frommer in 1987 [2] and later revised by Takai and Jones [3] and others. These algorithms have been widely used in the identification of CGIs in numerous studies. Takai and Jones' [3] stringent algorithm seems to outperform the others because it can effectively exclude short interspersed elements such as Alu and it can identify CGIs that are more likely associated with the 5' regions of genes [3]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call