Abstract

The pace of technology has allowed classification of feature-subset of methylated and unmethylated of CpG islands of DNA sequence properties. As methylation of CpG islands is involved in various biological phenomena and function of the DNA methylation is correlated to various human diseases such as cancer, analysis of the CpG islands has become important and useful in characterizing and modelling biological phenomena and understanding mechanism of such diseases. However, analysis of the data associated with the CpG islands is a quite new and challenging subject in bioinformatics, systems biology and epigenetics. In this paper, the problems associated with prediction of methylated and unmethylated CpG islands on human chromosome 21q are addressed. In order to carry out the prediction, a data set of 132 samples of the CpG islands from human peripheral blood leukocytes of chromosomes 21q and 4 different feature sub-sets totalling 44 attributes that characterise the methylated and unmethylated groups is extracted for each sample. Due to the nature of this unbalanced data set, in order to avoid disadvantages of traditional leave-one-out (LOO) and m-fold cross validation methods, the LOO method is modified by incorporating the m-fold cross validation approach. In addition, K-nearest neighbour classifier is then adapted for the prediction. The results gained through 440 different comprehensive analyses shows that the methylated CpG islands can be distinguished from the unmethylated CpG islands by a predictive accuracy of between 75% and 80%. More importantly, the modified LOO identifies more clearly and reliably when the feature sub-sets are combined. Another interesting observation is that the modified-LOO-based analysis reveals that the CpGI-specific feature-set achieve the highest predictive accuracy when combined with the other feature sets, which is not the case in the traditional LOO. This also further supports the robustness of the modified-LOO cross validation approach as CpGI-specific feature-set is one of the most important and effective attributes shown in other studies.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.