Extracting regulatory modules from gene expression data by sequential pattern mining

Mingoo Kim,Je-Gun Joung,Hyunjung Shin,Tae Su Chung,Ju Han Kim

doi:10.1186/1471-2164-12-s3-s5

Mingoo Kim, Je-Gun Joung + Show 3 more

Open Access

https://doi.org/10.1186/1471-2164-12-s3-s5

Copy DOI

Journal: BMC Genomics	Publication Date: Jan 1, 2011
Citations: 24	License type: cc-by

Affiliation: Seoul National University, Ajou University

Abstract

BackgroundIdentifying a regulatory module (RM), a bi-set of co-regulated genes and co-regulating conditions (or samples), has been an important challenge in functional genomics and bioinformatics. Given a microarray gene-expression matrix, biclustering has been the most common method for extracting RMs. Among biclustering methods, order-preserving biclustering by a sequential pattern mining technique has native advantage over the conventional biclustering approaches since it preserves the order of genes (or conditions) according to the magnitude of the expression value. However, previous sequential pattern mining-based biclustering has several weak points in that they can easily be computationally intractable in the real-size of microarray data and sensitive to inherent noise in the expression value.ResultsIn this paper, we propose a novel sequential pattern mining algorithm that is scalable in the size of microarray data and robust with respect to noise. When applied to the microarray data of yeast, the proposed algorithm successfully found long order-preserving patterns, which are biologically significant but cannot be found in randomly shuffled data. The resulting patterns are well enriched to known annotations and are consistent with known biological knowledge. Furthermore, RMs as well as inter-module relations were inferred from the biologically significant patterns.ConclusionsOur approach for identifying RMs could be valuable for systematically revealing the mechanism of gene regulation at a genome-wide level.

Highlights

Identifying a regulatory module (RM), a bi-set of co-regulated genes and co-regulating conditions, has been an important challenge in functional genomics and bioinformatics
Our approach for identifying RMs could be valuable for systematically revealing the mechanism of gene regulation at a genome-wide level
The algorithms are tested on simulation data with embedded sequential patterns

Summary

Introduction

Identifying a regulatory module (RM), a bi-set of co-regulated genes and co-regulating conditions (or samples), has been an important challenge in functional genomics and bioinformatics. Given a microarray geneexpression matrix, biclustering has been the most common method for extracting RMs. Among biclustering methods, order-preserving biclustering by a sequential pattern mining technique has native advantage over the conventional biclustering approaches since it preserves the order of genes (or conditions) according to the magnitude of the expression value. Given a microarray gene-expression matrix, comprised of the rows of genes and the columns of samples (or conditions), biclustering has been the most common method extracting RMs defined as a bi-set of co-regulated genes and coregulating conditions [5,6,7,8,9,10,11]. The random replacement may interfere with the subsequent identification of biclusters

Objectives

Methods

Results

Discussion

Conclusion