Abstract
BackgroundIdentifying a regulatory module (RM), a bi-set of co-regulated genes and co-regulating conditions (or samples), has been an important challenge in functional genomics and bioinformatics. Given a microarray gene-expression matrix, biclustering has been the most common method for extracting RMs. Among biclustering methods, order-preserving biclustering by a sequential pattern mining technique has native advantage over the conventional biclustering approaches since it preserves the order of genes (or conditions) according to the magnitude of the expression value. However, previous sequential pattern mining-based biclustering has several weak points in that they can easily be computationally intractable in the real-size of microarray data and sensitive to inherent noise in the expression value.ResultsIn this paper, we propose a novel sequential pattern mining algorithm that is scalable in the size of microarray data and robust with respect to noise. When applied to the microarray data of yeast, the proposed algorithm successfully found long order-preserving patterns, which are biologically significant but cannot be found in randomly shuffled data. The resulting patterns are well enriched to known annotations and are consistent with known biological knowledge. Furthermore, RMs as well as inter-module relations were inferred from the biologically significant patterns.ConclusionsOur approach for identifying RMs could be valuable for systematically revealing the mechanism of gene regulation at a genome-wide level.
Highlights
Identifying a regulatory module (RM), a bi-set of co-regulated genes and co-regulating conditions, has been an important challenge in functional genomics and bioinformatics
Our approach for identifying RMs could be valuable for systematically revealing the mechanism of gene regulation at a genome-wide level
The algorithms are tested on simulation data with embedded sequential patterns
Summary
Identifying a regulatory module (RM), a bi-set of co-regulated genes and co-regulating conditions (or samples), has been an important challenge in functional genomics and bioinformatics. Given a microarray geneexpression matrix, biclustering has been the most common method for extracting RMs. Among biclustering methods, order-preserving biclustering by a sequential pattern mining technique has native advantage over the conventional biclustering approaches since it preserves the order of genes (or conditions) according to the magnitude of the expression value. Given a microarray gene-expression matrix, comprised of the rows of genes and the columns of samples (or conditions), biclustering has been the most common method extracting RMs defined as a bi-set of co-regulated genes and coregulating conditions [5,6,7,8,9,10,11]. The random replacement may interfere with the subsequent identification of biclusters
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have