Abstract

BackgroundSerial Analysis of Gene Expression (SAGE) is a DNA sequencing-based method for large-scale gene expression profiling that provides an alternative to microarray analysis. Most analyses of SAGE data aimed at identifying co-expressed genes have been accomplished using various versions of clustering approaches that often result in a number of false positives.Principal FindingsHere we explore the use of seriation, a statistical approach for ordering sets of objects based on their similarity, for large-scale expression pattern discovery in SAGE data. For this specific task we implement a seriation heuristic we term ‘progressive construction of contigs’ that constructs local chains of related elements by sequentially rearranging margins of the correlation matrix. We apply the heuristic to the analysis of simulated and experimental SAGE data and compare our results to those obtained with a clustering algorithm developed specifically for SAGE data. We show using simulations that the performance of seriation compares favorably to that of the clustering algorithm on noisy SAGE data.ConclusionsWe explore the use of a seriation approach for visualization-based pattern discovery in SAGE data. Using both simulations and experimental data, we demonstrate that seriation is able to identify groups of co-expressed genes more accurately than a clustering algorithm developed specifically for SAGE data. Our results suggest that seriation is a useful method for the analysis of gene expression data whose applicability should be further pursued.

Highlights

  • With the advent of high throughput technologies, large-scale gene expression studies have become routine in many biological laboratories

  • Using both simulations and experimental data, we demonstrate that seriation is able to identify groups of co-expressed genes more accurately than a clustering algorithm developed for Serial Analysis of Gene Expression (SAGE) data

  • Our results suggest that seriation is a useful method for the analysis of gene expression data whose applicability should be further pursued

Read more

Summary

Introduction

With the advent of high throughput technologies, large-scale gene expression studies have become routine in many biological laboratories. Two conceptually different approaches to high throughput gene expression profiling are microarrays [1] and tag sequencing-based methods, such as Serial Analysis of Gene Expression (SAGE) [2]. A common aim of high throughput gene expression studies is to identify genes with similar expression profiles since such genes may be functionally related and may be used to predict functions of unknown genes. This aim has been most often addressed by various versions of clustering analysis that group genes into clusters with correlations among their expression values [4,5]. Most analyses of SAGE data aimed at identifying co-expressed genes have been accomplished using various versions of clustering approaches that often result in a number of false positives

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call