Abstract

As a high-throughput detection technology, the gene chips produce huge amount of gene expression data. How to effectively analyze the data has become an urgent need. Biclustering techniques have been used as important tools to find the local patterns in gene expression data. Biclustering is to find submatrices, so that a subset of the genes shows a “highly correlated behavior in a subset of conditions”. However, most existing biclustering algorithms are not able to find biclusters with contiguous columns. Since there is important internal sequential relationship in time-series data, these methods are not suitable for the analysis of time-series data. In order to explore the potential biological information of contiguous time point and find the co-expressed relationship among genes, this paper presents an efficient, accurate algorithm named k-CCC algorithm, to search contiguous coherent evolution biclusters in time-series data. The first step of the algorithm is to transform the original matrix into a difference matrix; then starting from the column pattern consisting of contiguous k columns, we gradually assemble them into patterns composed of more columns. A pattern update strategy is adopted to improve the efficiency of the algorithm. The algorithm can find all the embedded biclusters and show good scalability in simulated tests. Experimental results on real datasets show that the algorithm can find biclusters with statistical significance and strong biological relevance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call