Abstract

BackgroundMicroarray technology has made it possible to simultaneously monitor the expression levels of thousands of genes in a single experiment. However, the large number of genes greatly increases the challenges of analyzing, comprehending and interpreting the resulting mass of data. Selecting a subset of important genes is inevitable to address the challenge. Gene selection has been investigated extensively over the last decade. Most selection procedures, however, are not sufficient for accurate inference of underlying biology, because biological significance does not necessarily have to be statistically significant. Additional biological knowledge needs to be integrated into the gene selection procedure.ResultsWe propose a general framework for gene ranking. We construct a bipartite graph from the Gene Ontology (GO) and gene expression data. The graph describes the relationship between genes and their associated molecular functions. Under a species condition, edge weights of the graph are assigned to be gene expression level. Such a graph provides a mathematical means to represent both species-independent and species-dependent biological information. We also develop a new ranking algorithm to analyze the weighted graph via a kernelized spatial depth (KSD) approach. Consequently, the importance of gene and molecular function can be simultaneously ranked by a real-valued measure, KSD, which incorporates the global and local structure of the graph. Over-expressed and under-regulated genes also can be separately ranked.ConclusionThe gene-function bigraph integrates molecular function annotations into gene expression data. The relevance of genes is described in the graph (through a common function). The proposed method provides an exploratory framework for gene data analysis.

Highlights

  • Microarray technology has made it possible to simultaneously monitor the expression levels of thousands of genes in a single experiment

  • Applying the kernelized spatial depth (KSD) ranking algorithm to the bigraphs constructed from gene expression responses to each agent, all induced and repressed genes were ordered separately according to their KSD values

  • Our goal is to identify genes differentially modulated between the genotoxic and the cytotoxic agents

Read more

Summary

Introduction

Microarray technology has made it possible to simultaneously monitor the expression levels of thousands of genes in a single experiment. Selecting a subset of important genes is inevitable to address the challenge. Additional biological knowledge needs to be integrated into the gene selection procedure. Introduction Microarray technology has made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Selecting a subset of important genes is necessary to address the challenge for two primary reasons. This problem is aggravated when the number of variables is large compared to the number of examples, and even worse for gene expression data which usually has ten or twenty thousand genes but with only a very limited number of samples. To answer the question of what genes are important for distinguishing between cancerous and normal tissue may lead to new medical practices

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call