Abstract

Abstract The Illumina Infinium HumanMethylation27 (Illumina 27K) BeadChip assay is a relatively recent high-throughput technology that allows over 27,000 CpGs to be assayed. The Illumina 27K methylation data is less commonly used in comparison to gene expression in bioinformatics. It provides a critical need to find the optimal feature ranking (FR) method for handling the high dimensional data. The optimal FR method on the classifier is not well known, and choosing the best performing FR method becomes more challenging in high dimensional data setting. Therefore, identifying the statistical methods which boost the inference is of crucial importance in this context. This paper describes the detailed performances of FR methods such as fisher score, information gain, chi-square, and minimum redundancy and maximum relevance on different classification methods such as Adaboost, Random Forest, Naive Bayes, and Support Vector Machines. Through simulation study and real data applications, we show that the fisher score as an FR method, when applied on all the classifiers, achieved best prediction accuracy with significantly small number of ranked features.

Highlights

  • DNA methylation (DNAm) is an important epigenetic mechanism [1] controlling direct modi cation of DNA and abnormal DNA has been involved in the formation of diseases [2, 3]

  • We compare the performance of all feature ranking (FR) and classi ers through extensive simulation studies with three different scenarios and to further support the results real DNAm dataset is used which are described in earlier section 2

  • Minimum Redundancy Maximum Relevance (MRMR) and Information Gain (IG) are not showing the best performance, For analyzing the behavior of these FR methods on all the classi ers, we look at the corresponding accuracies and we can note that the MRMR and IG behaves similar to the Support vector machines (SVM) classi er with all FR methods. i.e., as the number of ranked features increases from 30 to 300, the accuracy of all the classi ers increases

Read more

Summary

Introduction

DNA methylation (DNAm) is an important epigenetic mechanism [1] controlling direct modi cation of DNA and abnormal DNA has been involved in the formation of diseases [2, 3]. DNAm involves a process where the methyl group is added to the fth carbon of a cytosine ring of DNA molecule, the presence of 5-methyl cytosine may change the activity around the DNA sequence. The methyl group can alter the transcription process of genes [8] and may lead to progression of tumor [9]. DNAm changes are linked with di erent types of diseases including neurological disorders, cardiovascular, and cancer [10, 11, 12]. Studies on DNAm can help in biomarker identi cation and disease classi cation [13, 14]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call