Bicluster Analysis of Biomedical Data based on Multi-objective Evolutionary Optimization

Maryam Golchin

doi:10.25904/1912/2189

Abstract

Knowledge discovery is the process of finding hidden knowledge from a large volume of data that involves data mining. Data mining unveils interesting relationships among data and the results can help to make valuable predictions or recommendation in various applications. Recently, biclustering has become a common method in data mining and pattern recognition. Biclustering is an unsupervised machine learning method that can uncover and extract accurate and useful information from high-dimensional sparse data. Biclustering has found many useful applications for visualization and exploratory analysis in various fields such as knowledge discovery, data mining, pattern classification, information retrieval, collaborative filtering, and especially in gene expression data analysis such as functional annotation, tissue classification, and motif identification. It has been shown in previous studies that finding biclusters of data is inherently intractable and computationally complex. Generally, the challenges of biclustering include the high dimensionality of data, noisy data, different types of bicluster patterns, and the fact that biclusters can overlap. Although there are several studies in biclustering, after a review of the methods proposed in the literature, we found that these challenges are not addressed properly. Most of the proposed methods in literature can only detect a limited set of bicluster patterns under restrictive assumptions about the data. Moreover, in many methods biclusters are detected sequentially, i.e., the method replaces the detected bicluster with the background and detects the next bicluster, thus preventing the detection of overlapping biclusters. Given the above statements, there is a need for innovative methods to extract valuable information from the data and to reach a deeper understanding of the outcomes. Therefore, in this study, we first proposed a method (PBD-SPEA) that uses a new dynamic encoding scheme to detect multiple overlapped biclusters concurrently. However, the implementation is complex as there are several heuristic search procedures in different steps of the proposed method, and it is not able to detect all types of patterns in biclusters. Thus, a second method (LBDP) is proposed based on geometrical biclustering. In this method, we search for hyperplanes from the data using an evolutionary algorithm. Applying this idea, we are able to detect all types of bicluster patterns concurrently. We defined several scenarios in both synthetic and real data to test the performance of the proposed methods. Although our work is initially targeted for biomedical data (gene expression data), we also tested the generality of the algorithms on other non-medical data, such as image data and social networking data. In all scenarios, our methods achieved reliable results compared to several state-of-the-arts.

Full Text