Selecting maximally informative genes

Ioannis P Androulakis

doi:10.1016/j.compchemeng.2004.08.037

Abstract

Microarray experiments are emerging as one of the main driving forces in modern biology. By allowing the simultaneous monitoring of the expression of the entire genome for a given organism, array experiments provide tremendous insight into the fundamental biological processes that translate genetic information. One of the major challenges is to identify computationally efficient and biologically meaningful analysis approaches to extract the most informative and unbiased components of the microarray data. This process is complicated by the fact that a number of uncertainties are associated with array experiments. Therefore, the assumption of the existence of a unique computational descriptive model needs to be challenged. In this paper, we introduce a framework that integrates machine learning and optimization techniques for the selection of maximally informative genes in microarray expression experiments. The fundamental premise of the approach is that maximally informative genes are the ones that lead to least complex descriptive and predictive models. We propose a methodology, based on decision trees, which identifies ensembles of groups of maximally informative genes. We raise a number of computational issues that need to be comprehensively addressed and illustrate the approach by analyzing recently published microarray experimental data.

Full Text