Abstract

Regions of DNA rich in CpG dinucleotides, also known as CpG islands, are often located upstream of the transcription start site in both tissue specific and housekeeping genes. Overall, CpG dinucleotides are observed at a density of 25% the expected level from base composition alone, partially due to 5methylcytosine decay (Bird, 1993). Since CpG dinucleotides typically occur with low frequency, CpG islands can be distinguished statistically in the genome. Our method of detecting CpG islands involves a heuristic algorithm employing classic changepoint methods and log-likelihood statistics. A Java applet has been created to allow for user interaction and visualization of the segmentation resulting from the changepoint analysis. The model is tested using several sequences obtainable from GenBank (NCBI, 1997), including a 220 Kb fragment of human X chromosome from the filanin (FLN) gene to the glucose-6phosphate dehydrogenase (G6PD) gene which has been experimentally studied (Rivella, et. al., 1995; E.Y. Chen, et. al., 1996). Preliminary results suggest a breakpoint segmentation that is consistent with observable manual analysis. About 56% of human genes have associated CpG rich islands (Antequera and Bird, 1993). By identifying the CpG islands, it is thought that regions of DNA coding for housekeeping or tissue-specific genes can be located (Antequera and Bird, 1993) even in the absence of transcriptional activity. Biological experiments searching for such genes can then be narrowed given the locations of the CpG islands. COMPUTATIONAL DETECTION OF CpG ISLANDS IN DNA Eric C. Rouchka, Richard Mazzarella, and David J. States Institute for Biomedical Computing Washington University 700 S. Euclid Avenue Saint Louis, MO 63110 Ecr@ibc.wustl.edu rich@borcim.wustl.edu states@ibc.wustl.edu

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call