Abstract
BackgroundNowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space.ResultsWe developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples.ConclusionsOur analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.
Highlights
Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points
OPTricluster is applied to analyze four different 3D gene expression datasets: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance (SAR) in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and Brassica napus whole seed development
The GO analysis plug-in of the Gene Ontology Analysis (GOAL) [23] package that we recently developed is integrated into OPTricluster for biological evaluation of the clusters
Summary
It is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points Such data have three dimensions: gene-sample-time (GST). Time-series gene expression data are widely used to study the dynamic behaviour of various biological processes in the cell [3,4,5] They can be classified into two categories (relative to the clustering algorithms design for their analysis): short time-. It is possible to collect expression levels of a set of genes for a given set of biological samples during a series of time points Such data have three dimensions, gene-sample-time (GST), and are called 3D gene expression data. This kind of coherent clusters may contain information that could be used to identify useful phenotypes, potential genes related to these phenotypes and their interaction/ regulation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.