Abstract
Feature selection methods for cancer classification are aimed to overcome the high dimensionality of the biomedical data which is a challenging task. Most of the feature selection methods based on DNA methylation are time consuming during testing phase to identify the best pertinent features subset that are relevant to accurate prediction. However, the hybridization between feature selection and extraction methods will bring a method that is far fast than only feature selection method. This paper proposes a framework based on both novel feature selection methods that employ statistical variation, standard deviation and entropy, along with extraction methods to predict cancer using three new features, namely, Hypomethylation, Midmethylation and Hypermethylation. These new features represent the average methylation density of the corresponding three regions. The three features are extracted from the selected features based on the analysis of the methylation behavior. The effectiveness of the proposed framework is evaluated by the breast cancer classification accuracy. The results give 98.85% accuracy using only three features out of 485,577 features. This result proves the capability of the proposed approach for breast cancer diagnosis and confirms that feature selection and extraction methods are critical for practical implementation.
Highlights
Cancer is a leading cause of death worldwide, it begins when some cells in a part of the body start to grow out of control
This article proposes a framework based on novel feature selection methods along with extraction methods, to identify the informative probes that underlie the pathogenesis of tumor cell proliferation and improve cancer classification accuracy
The proposed feature selection method DV1 uses statistical variation in terms of the standard deviation for obtaining the discriminative value while the other proposed feature selection method DV2 uses entropy to rank features and obtains the more variational features with lower amount of uncertainty involved in its values
Summary
Cancer is a leading cause of death worldwide, it begins when some cells in a part of the body start to grow out of control. Recent research increases evidences that the epigenetic modifications play a critical role in human cancer. Using feature selection and extraction techniques in cancer predication becomes essential to identify the informative probes that underlie the pathogenesis of tumor cell proliferation. We propose a framework based on feature selection and extraction methods, to rid of irrelevant information and improve cancer classification accuracy based on DNA methylation data. A novel feature selection based on statistical variation and standard deviation is utilized for identifying the small set of discriminative methylated DNA probes, afterwards, the average methylation density of three regions (hypomethylation, midmethylation and hypermethylation) is calculated as new extracted features to predict cancer.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Advanced Computer Science and Applications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.