Biclustering of Biological Sequences

Faouzi Mhamdi,Sourour Marai

doi:10.1109/dexa.2017.31

Abstract

The analysis of biological data is a challenging problem in bioinformatics and data mining field. Given the complexity of the analysis of biological information, several methods have been proposed for analyzing this biological information in databases mostly in the form of genetic sequences and protein structures. Actually, genetic sequences are represented by matrices that indicate the expression levels of thousands of genes under several conditions. The analysis of this huge amount of data consists in extracting genes that behave similarly under certain conditions. In fact, the extracted information are sub-matrices (biclusters) that satisfy a coherence constraint. The process of extracting them is called biclustering. In this paper, we deal with biclustering problems applied to the analysis of biological data. First, a description of the problem is reviewed. Furthermore, we present a description of the divide and conquer approach that we will adopt to our algorithm for extracting biclusters. Additionally, a new evaluation function intitled Pattern Correlation Value (PCV), allowing identification of all biclusters types is proposed. Experimental results, demonstrate that the proposed methods are effective on this problem and are able to extract relevant information from the considered data.

Full Text