Abstract

Motivations. Biclustering or simultaneous clustering of both genes and conditions have generated considerable interest over the past few decades, particularly related to the analysis of high-dimensional gene expression data in information retrieval, knowledge discovery, and data mining [1]. Given a gene expression data matrix, a bicluster is a submatrix of genes and conditions that exhibits a high correlation of expression activity across both rows and columns. The problem of locating the most significant bicluster has been shown to be NP-complete. Therefore, given the inner intractability of the problem from a computational point of view, to efficiently find good solutions in a reasonable running times we have designed and implemented several different metaheuristics based on a GRASP framework. Methods. All the procedures are based on a GRASP (Greedy Randomized Adaptive Search Procedure) framework [2]. A GRASP is a randomized multistart iterative metaheuristic consisting of two phases: a construction phase and a local search phase. The construction phase builds iteratively a feasible solution in a greedy randomized fashion. Once a feasible solution is obtained, a local search procedure attempts to improve it by producing a locally optimal solution with respect to suitable defined neighborhood structure. The construction and the local search phases are repeatedly applied. The best local optimum solution found is returned as an approximation of the optimal one. We designed and implemented several different GRASP-based metaheuristics that differ in both construction and local search phase. In the construction phase, one implements a k-means and a second one a greedy randomized procedure inspired by a minimum spanning tree of a suitable weighted graph. Then, two types of local searches have been implemented: one has been already proposed [3] and a second one is an Iterated Local Search [4]. All the designed algorithms have been tested and compared using the lymphoma dataset [5]. Results. Both the solution construction procedure and the local search procedure use a gain function that combines the mean squared residue [1], the row variance, and the size of the bicluster. Experimental results on Lymphoma microarray data set [5] are promising indicating that the methods are able to find significant biclusters, also from a biological point of view. Acknowledgements A.M. and L.M. are supported by MIUR FIRB ITALBIONET (RBPR05ZK2Z and RBIN064YAT_003). The work has been made in the frame of the Flagship Project InterOmics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.