Abstract DNA microarray technology monitors gene activity in real-time in living organisms. It creates a large amount of data that helps scientists learn about how genes work. Clustering this data helps understand gene interactions and uncover important biological processes. However, the traditional clustering techniques have difficulties due to the enormous dimensionality of gene expression data and the intricacy of biological networks. Although ensemble clustering is a viable strategy, such high-dimensional data may not lend itself well to traditional approaches. This study introduces a novel technique for gene expression data clustering called incremental ensemble clustering for gene expression data (IECG). There are two steps in the IECG. A technique for grouping gene expression data into windows is presented in the first step, producing a tree of clusters. This procedure is carried out again for succeeding windows that have distinct feature sets. The base clusterings of two consecutive windows are ensembled using a new goal function to form a new clustering solution. By repeating this step-by-step method for further windows, reliable patterns that are beneficial for medical applications can be extracted. The results from both biological and non-biological data demonstrate that the proposed algorithm outperformed the state-of-the-art algorithms. Additionally, the running time of the proposed algorithm has been examined.
Read full abstract