Abstract

Modern microarray chips can hold gene information from thousands of genes and hundreds of individuals and the main challenge of an effective feature selection method is to identify most useful genes from the whole dataset. Removal of less informative genes helps to alleviate the effects of noise and redundancy and simplifies the task of disease classification and prediction of medical conditions such as cancer. Genetic Algorithm (GA) based wrapper model performs well but suffers from over-fitting problem and the initial population is large and random. Traditional approaches use a filter based preprocessing step to reduce the dimension of the data on which GA operates and as filtering methods on its own has shown to introduce redundant features, in this paper Boosted Feature Subset Selection (BFSS) which is a boosted t-score filter method, is used as a preprocessing step. The gene subset provided by BFSS is fed to a Genetic Algorithm which reduces the feature subset in smaller numbers and helps to generate a better optimal subset of genes. The proposed hybrid approach is applied on leukemia, colon and lung cancer benchmarked datasets and have shown better results than other well-known approaches. General Terms Pattern Recognition, Feature Selection, Microarray Data Analysis, Evolutionary Algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.