A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series.

Sara C Madeira,Arlindo L Oliveira

doi:10.1186/1748-7188-4-8

Abstract

BackgroundThe ability to monitor the change in expression patterns over time, and to observe the emergence of coherent temporal responses using gene expression time series, obtained from microarray experiments, is critical to advance our understanding of complex biological processes. In this context, biclustering algorithms have been recognized as an important tool for the discovery of local expression patterns, which are crucial to unravel potential regulatory mechanisms. Although most formulations of the biclustering problem are NP-hard, when working with time series expression data the interesting biclusters can be restricted to those with contiguous columns. This restriction leads to a tractable problem and enables the design of efficient biclustering algorithms able to identify all maximal contiguous column coherent biclusters.MethodsIn this work, we propose e-CCC-Biclustering, a biclustering algorithm that finds and reports all maximal contiguous column coherent biclusters with approximate expression patterns in time polynomial in the size of the time series gene expression matrix. This polynomial time complexity is achieved by manipulating a discretized version of the original matrix using efficient string processing techniques. We also propose extensions to deal with missing values, discover anticorrelated and scaled expression patterns, and different ways to compute the errors allowed in the expression patterns. We propose a scoring criterion combining the statistical significance of expression patterns with a similarity measure between overlapping biclusters.ResultsWe present results in real data showing the effectiveness of e-CCC-Biclustering and its relevance in the discovery of regulatory modules describing the transcriptomic expression patterns occurring in Saccharomyces cerevisiae in response to heat stress. In particular, the results show the advantage of considering approximate patterns when compared to state of the art methods that require exact matching of gene expression time series.DiscussionThe identification of co-regulated genes, involved in specific biological processes, remains one of the main avenues open to researchers studying gene regulatory networks. The ability of the proposed methodology to efficiently identify sets of genes with similar expression patterns is shown to be instrumental in the discovery of relevant biological phenomena, leading to more convincing evidence of specific regulatory mechanisms.AvailabilityA prototype implementation of the algorithm coded in Java together with the dataset and examples used in the paper is available in .

Highlights

The ability to monitor the change in expression patterns over time, and to observe the emergence of coherent temporal responses using gene expression time series, obtained from microarray experiments, is critical to advance our understanding of complex biological processes
We propose e-CCC-Biclustering, a biclustering algorithm developed for time series expression data analysis, that finds and reports all maximal contiguous column coherent biclusters with approximate expression patterns in time polynomial in the size of the expression matrix
All the results presented are based on the analysis of Gene Ontology annotations obtained using the GOToolbox database [28], together with information about transcriptional regulations available in the YEASTRACT database [29]

Summary

Introduction

The ability to monitor the change in expression patterns over time, and to observe the emergence of coherent temporal responses using gene expression time series, obtained from microarray experiments, is critical to advance our understanding of complex biological processes In this context, biclustering algorithms have been recognized as an important tool for the discovery of local expression patterns, which are crucial to unravel potential regulatory mechanisms. Being able to monitor the change in expression patterns over time, and to observe the emergence of coherent temporal responses of many interacting components, should provide the basis for understanding evolving but complex biological processes, such as disease progression, growth, development, and drug responses [2] In this context, several machine learning methods have been used in the analysis of gene expression data [3]. This fact led several authors to point out the relevance of biclusters with contiguous columns and their importance in the identification of regulatory mechanisms [9,20,22,24]

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms for Molecular Biology	Publication Date: Jun 4, 2009
Citations: 70	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology

Lead the way for us

Similar Papers

Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data.
Moysés Nascimento ... Thelma Sáfadi
PLOS ONE | VOL. 12
Moysés Nascimento, et. al.Moysés Nascimento ... Thelma Sáfadi
17 Jul 2017
PLOS ONE | VOL. 12

TimesVector: a vectorized clustering approach to the analysis of time series transcriptome data from multiple phenotypes.
Inuk Jung ... Youngjae Yu
Bioinformatics | VOL. 33
Inuk Jung, et. al.Inuk Jung ... Youngjae Yu
17 Jan 2017
Bioinformatics | VOL. 33

Detecting periodic patterns in unevenly spaced gene expression time series using Lomb–Scargle periodograms
Earl F Glynn ... Arcady R Mushegian
Bioinformatics | VOL. 22
Earl F Glynn, et. al.Earl F Glynn ... Arcady R Mushegian
22 Nov 2005
Bioinformatics | VOL. 22

Restoration of Liver Mass after Injury Requires Proliferative and Not Embryonic Transcriptional Patterns
Hasan H Otu ... Seth J Karp
Journal of Biological Chemistry | VOL. 282
Hasan H Otu, et. al.Hasan H Otu ... Seth J Karp
01 Apr 2007
Journal of Biological Chemistry | VOL. 282

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology