Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes

Thanyaluk Jirapech-Umpai,Stuart Aitken

doi:10.1186/1471-2105-6-148

Abstract

BackgroundIn the clinical context, samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golub et al. [1] and the NCI60 dataset of Ross et al. [2] present multiclass classification problems where three tumour types and nine cell lines respectively must be identified. We apply an evolutionary algorithm to identify the near-optimal set of predictive genes that classify the data. We also examine the initial gene selection step whereby the most informative genes are selected from the genes assayed.ResultsIn the absence of feature selection, classification accuracy on the training data is typically good, but not replicated on the testing data. Gene selection using the RankGene software [3] is shown to significantly improve performance on the testing data. Further, we show that the choice of feature selection criteria can have a significant effect on accuracy. The evolutionary algorithm is shown to perform stably across the space of possible parameter settings – indicating the robustness of the approach. We assess performance using a low variance estimation technique, and present an analysis of the genes most often selected as predictors.ConclusionThe computational methods we have developed perform robustly and accurately, and yield results in accord with clinical knowledge: A Z-score analysis of the genes most frequently selected identifies genes known to discriminate AML and Pre-T ALL leukemia. This study also confirms that significantly different sets of genes are found to be most discriminatory as the sample classes are refined.

Highlights

In the clinical context, samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors
In this study we explore the alternative methods provided by the RankGene software [3] for the initial feature selection task
We begin by demonstrating that the performance of the population of predictors improves on each iteration of the evolutionary algorithm

Summary

Introduction

Samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golub et al [1] and the NCI60 dataset of Ross et al [2] present multiclass classification problems where three tumour types and nine cell lines respectively must be identified. We apply an evolutionary algorithm to identify the near-optimal set of predictive genes that classify the data. Microarray technology has provided biologists with the ability to measure the expression levels of thousands of genes in a single experiment. The goal of classification is to identify the differentially expressed genes that may be used to predict class membership for new samples. This paper addresses the multi-class classification of microarray data,

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jan 1, 2005
Citations: 302	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Identification of the Most Sensitive and Robust Immunohistochemical Markers in Different Categories of Ovarian Sex Cord-stromal Tumors
Chengquan Zhao ... Ross Barner
American Journal of Surgical Pathology | VOL. 33
Chengquan Zhao, et. al.Chengquan Zhao ... Ross Barner
01 Mar 2009
American Journal of Surgical Pathology | VOL. 33

Abstract 2957: Uncovering tumor-specific components of the p53 pathway using mouse models and RNAi
Francisco J SáNchez-Rivera ... Corbin E Meacham
Cancer Research | VOL. 72
Francisco J SáNchez-Rivera, et. al.Francisco J SáNchez-Rivera ... Corbin E Meacham
15 Apr 2012
Abstract 2957: Uncovering tumor-specific components of the p53 pathway using mouse models and RNAi
Francisco J SáNchez-Rivera ... Corbin E Meacham

Abstract 2961: The level of mitochondrial apoptotic priming determines cell fate upon p53 restoration
Francisco J Sánchez-Rivera ... Anthony Letai
Cancer Research | VOL. 74
Francisco J Sánchez-Rivera, et. al.Francisco J Sánchez-Rivera ... Anthony Letai
30 Sep 2014
Abstract 2961: The level of mitochondrial apoptotic priming determines cell fate upon p53 restoration
Francisco J Sánchez-Rivera ... Anthony Letai

Abstract A46: A comprehensive genomic pan-cancer analysis comparing males and females using The Cancer Genome Atlas gene expression data
Yuanyuan Li ... David M Umbach
Clinical Cancer Research | VOL. 23
Yuanyuan Li, et. al.Yuanyuan Li ... David M Umbach
01 Jan 2017
Clinical Cancer Research | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics