Abstract

BackgroundIn order to interpret the results obtained from a microarray experiment, researchers often shift focus from analysis of individual differentially expressed genes to analyses of sets of genes. These gene-set analysis (GSA) methods use previously accumulated biological knowledge to group genes into sets and then aim to rank these gene sets in a way that reflects their relative importance in the experimental situation in question. We suspect that the presence of paralogs affects the ability of GSA methods to accurately identify the most important sets of genes for subsequent research.ResultsWe show that paralogs, which typically have high sequence identity and similar molecular functions, also exhibit high correlation in their expression patterns. We investigate this correlation as a potential confounding factor common to current GSA methods using Indygene http://www.cbio.uct.ac.za/indygene, a web tool that reduces a supplied list of genes so that it includes no pairwise paralogy relationships above a specified sequence similarity threshold. We use the tool to reanalyse previously published microarray datasets and determine the potential utility of accounting for the presence of paralogs.ConclusionsThe Indygene tool efficiently removes paralogy relationships from a given dataset and we found that such a reduction, performed prior to GSA, has the ability to generate significantly different results that often represent novel and plausible biological hypotheses. This was demonstrated for three different GSA approaches when applied to the reanalysis of previously published microarray datasets and suggests that the redundancy and non-independence of paralogs is an important consideration when dealing with GSA methodologies.

Highlights

  • In order to interpret the results obtained from a microarray experiment, researchers often shift focus from analysis of individual differentially expressed genes to analyses of sets of genes

  • Many methods proposed have shifted the focus from analysis of individual genes to sets of genes typically defined by their annotations to terms in databases such as the Gene Ontology (GO) [1], the

  • gene-set analysis (GSA) often represents the first attempt to make biological sense of the data obtained from a microarray, or any high-throughput experiment, and these methods enable the generation of hypotheses regarding the experiment

Read more

Summary

Introduction

In order to interpret the results obtained from a microarray experiment, researchers often shift focus from analysis of individual differentially expressed genes to analyses of sets of genes. These gene-set analysis (GSA) methods use previously accumulated biological knowledge to group genes into sets and aim to rank these gene sets in a way that reflects their relative importance in the experimental situation in question. Kyoto Encyclopaedia of Genes and Genomes (KEGG) [2] or the Molecular Signatures Database (MSigDB) [3] These gene-set analysis (GSA) methods aim to rank these sets in a way that reflects their relative contributions to the observed gene expression changes in a particular experiment. The incorporation of an independent representation of previously accumulated biological knowledge into the analysis has proven to be powerful [4] and shifting the focus from individual genes to sets of genes has been shown to identify biological themes more consistently across independent studies than results from single-gene analyses [3]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.