Abstract

BackgroundCpG islands have been demonstrated to influence local chromatin structures and simplify the regulation of gene activity. However, the accurate and rapid determination of CpG islands for whole DNA sequences remains experimentally and computationally challenging.Methodology/Principal FindingsA novel procedure is proposed to detect CpG islands by combining clustering technology with the sliding-window method (PSO-based). Clustering technology is used to detect the locations of all possible CpG islands and process the data, thus effectively obviating the need for the extensive and unnecessary processing of DNA fragments, and thus improving the efficiency of sliding-window based particle swarm optimization (PSO) search. This proposed approach, named ClusterPSO, provides versatile and highly-sensitive detection of CpG islands in the human genome. In addition, the detection efficiency of ClusterPSO is compared with eight CpG island detection methods in the human genome. Comparison of the detection efficiency for the CpG islands in human genome, including sensitivity, specificity, accuracy, performance coefficient (PC), and correlation coefficient (CC), ClusterPSO revealed superior detection ability among all of the test methods. Moreover, the combination of clustering technology and PSO method can successfully overcome their respective drawbacks while maintaining their advantages. Thus, clustering technology could be hybridized with the optimization algorithm method to optimize CpG island detection.Conclusion/SignificanceThe prediction accuracy of ClusterPSO was quite high, indicating the combination of CpGcluster and PSO has several advantages over CpGcluster and PSO alone. In addition, ClusterPSO significantly reduced implementation time.

Highlights

  • CpG islands are short sequences in a genome, with high concentrations of Cytosine (C) and Guanine (G) nucleotides where CpG islands include CpG dinucleotides (CpGs)

  • All contig sequences and CpG islands were verified based on sequence analysis and bisulphite sequencing (BS-seq) and were obtained from NCBI, along with the entire human genome (NCBI.36)

  • All p-values for ClusterPSO vs. the other eight methods on the six contig sequence results result in p < 0.0001

Read more

Summary

Introduction

CpG islands are short sequences in a genome, with high concentrations of Cytosine (C) and Guanine (G) nucleotides where CpG islands include CpG dinucleotides (CpGs). In 2002, Takai and Jones proposed a rigorous CpG island definition [2], including a minimum length of 500 bps, GC content of 55% and an O/E ratio of 0.65. The 500 bp length is proposed to avoid Alu sequences in CpG islands. An Alu sequence indicates a highly repetitive short interspersed element with an approximate consensus sequence of about 280 bps, and the sequence exhibits high GC content levels and O/E ratio. Development of an accurate method for CpG island detection could be useful in research for drug, cancer, and genomic markers. Editor: Valerie W Hu, The George Washington University, UNITED STATES

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call