Abstract
BackgroundAs crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of CGI rely on: (a) %G+C content is ≥ 50%, (b) the ratio of the observed CpG content and the expected CpG content is ≥ 0.6, and (c) the general length of CGI is greater than 200 nucleotides. Most existing computational methods for the prediction of CpG island are programmed on these rules. However, many experimentally verified CpG islands deviate from these artificial criteria. Experiments indicate that in many cases %G+C is < 50%, CpG obs/CpG exp varies, and the length of CGI ranges from eight nucleotides to a few thousand of nucleotides. It implies that CGI detection is not just a straightly statistical task and some unrevealed rules probably are hidden.ResultsA novel Gaussian model, GaussianCpG, is developed for detection of CpG islands on human genome. We analyze the energy distribution over genomic primary structure for each CpG site and adopt the parameters from statistics of Human genome. The evaluation results show that the new model can predict CpG islands efficiently by balancing both sensitivity and specificity over known human CGI data sets. Compared with other models, GaussianCpG can achieve better performance in CGI detection.ConclusionsOur Gaussian model aims to simplify the complex interaction between nucleotides. The model is computed not by the linear statistical method but by the Gaussian energy distribution and accumulation. The parameters of Gaussian function are not arbitrarily designated but deliberately chosen by optimizing the biological statistics. By using the pseudopotential analysis on CpG islands, the novel model is validated on both the real and artificial data sets.
Highlights
As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention
DNA methylation usually occurs in CpG islands and adds an additional methyl to cytosine such that the gene silencing may be caused by the additional methyl
Assumptions In order to simplify the microscopical interactions in the DNA genome and reflect the general principles of the complex system, we propose the Gaussian model based on the following assumptions: (a) Each CpG site preserves the potential energy and the CpG-rich regions where energy are highly aggregated have more potential opportunities for epigenetic events. (b) Each CpG island is regarded as an energy field where only the contained CpG sites can affect mutually. (c) The energy of each CpG site is closely related to its primary structure or secondary/ tertiary structures
Summary
As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. Experiments indicate that in many cases %G+C is < 50%, CpGobs/CpGexp varies, and the length of CGI ranges from eight nucleotides to a few thousand of nucleotides It implies that CGI detection is not just a straightly statistical task and some unrevealed rules probably are hidden. CpG contents in the entire human DNA genome are generally suppressed to only around 1% comparing with other combinations [1] Scientists find that it is in CpG islands where many biological processes occur closely related with high density of CpG contents [2]. DNA methylation usually occurs in CpG islands and adds an additional methyl to cytosine such that the gene silencing may be caused by the additional methyl This subtle process can further give rise to gene regulatory differentiation and various epigenetic issues. Conventional bisulfite modification-based methods to determine CpG islands and methylation regions are time-consuming [3]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.