Abstract

Feature selection methods for cancer classification are aimed to overcome the high dimensionality of the biomedical data which is a challenging task. Most of the feature selection methods based on DNA methylation are time consuming during testing phase to identify the best pertinent features subset that are relevant to accurate prediction. However, the hybridization between feature selection and extraction methods will bring a method that is far fast than only feature selection method. This paper proposes a framework based on both novel feature selection methods that employ statistical variation, standard deviation and entropy, along with extraction methods to predict cancer using three new features, namely, Hypomethylation, Midmethylation and Hypermethylation. These new features represent the average methylation density of the corresponding three regions. The three features are extracted from the selected features based on the analysis of the methylation behavior. The effectiveness of the proposed framework is evaluated by the breast cancer classification accuracy. The results give 98.85% accuracy using only three features out of 485,577 features. This result proves the capability of the proposed approach for breast cancer diagnosis and confirms that feature selection and extraction methods are critical for practical implementation.

Highlights

  • Cancer is a leading cause of death worldwide, it begins when some cells in a part of the body start to grow out of control

  • This article proposes a framework based on novel feature selection methods along with extraction methods, to identify the informative probes that underlie the pathogenesis of tumor cell proliferation and improve cancer classification accuracy

  • The proposed feature selection method DV1 uses statistical variation in terms of the standard deviation for obtaining the discriminative value while the other proposed feature selection method DV2 uses entropy to rank features and obtains the more variational features with lower amount of uncertainty involved in its values

Read more

Summary

INTRODUCTION

Cancer is a leading cause of death worldwide, it begins when some cells in a part of the body start to grow out of control. Recent research increases evidences that the epigenetic modifications play a critical role in human cancer. Using feature selection and extraction techniques in cancer predication becomes essential to identify the informative probes that underlie the pathogenesis of tumor cell proliferation. We propose a framework based on feature selection and extraction methods, to rid of irrelevant information and improve cancer classification accuracy based on DNA methylation data. A novel feature selection based on statistical variation and standard deviation is utilized for identifying the small set of discriminative methylated DNA probes, afterwards, the average methylation density of three regions (hypomethylation, midmethylation and hypermethylation) is calculated as new extracted features to predict cancer.

RELATED WORKS
Dataset
Proposed Framework
Feature Selection Methods
Feature Extraction Method
Classification
RESULTS
Method
Findings
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call