Abstract

BackgroundBiologists aim to understand the genetic background of diseases, metabolic disorders or any other genetic condition. Microarrays are one of the main high-throughput technologies for collecting information about the behaviour of genetic information on different conditions. In order to analyse this data, clustering arises as one of the main techniques used, and it aims at finding groups of genes that have some criterion in common, like similar expression profile. However, the problem of finding groups is normally multi dimensional, making necessary to approach the clustering as a multi-objective problem where various cluster validity indexes are simultaneously optimised. They are usually based on criteria like compactness and separation, which may not be sufficient since they can not guarantee the generation of clusters that have both similar expression patterns and biological coherence.MethodWe propose a Multi-Objective Clustering algorithm Guided by a-Priori Biological Knowledge (MOC-GaPBK) to find clusters of genes with high levels of co-expression, biological coherence, and also good compactness and separation. Cluster quality indexes are used to optimise simultaneously gene relationships at expression level and biological functionality. Our proposal also includes intensification and diversification strategies to improve the search process.ResultsThe effectiveness of the proposed algorithm is demonstrated on four publicly available datasets. Comparative studies of the use of different objective functions and other widely used microarray clustering techniques are reported. Statistical, visual and biological significance tests are carried out to show the superiority of the proposed algorithm.ConclusionsIntegrating a-priori biological knowledge into a multi-objective approach and using intensification and diversification strategies allow the proposed algorithm to find solutions with higher quality than other microarray clustering techniques available in the literature in terms of co-expression, biological coherence, compactness and separation.

Highlights

  • Biologists aim to understand the genetic background of diseases, metabolic disorders or any other genetic condition

  • All the algorithms used in the experiment were implemented using R [55] version 3.2.5 and computational tests performed on a computer with Intel Xeon E5-2650V4 30 MB, 4 CPUs, 2.2Ghz, 96 cores/threads, 128GB RAM, 4TB

  • Datasets Datasets used for experiments correspond to four real-life microarray gene expression datasets: arabidopsis thaliana [58], yeast cell cycle [59], yeast cell sporulation [60], and human fibroblasts serum [61] which were taken from here [62]

Read more

Summary

Introduction

Biologists aim to understand the genetic background of diseases, metabolic disorders or any other genetic condition. A DNA microarray is used to collect information regarding gene expression level [1] under different conditions like a time series during a biological process, experiments of different tissue samples, among others [2] This high-throughput technology has allowed a fast progress in biological and biomedical research [3], and it has facilitated the study of problems such as differential gene expression [4, 5], patterns of genes with (dis)similar expression levels [6,7,8], prediction of response to treatment [9, 10] and detection of gene mutations [11]. Clustering can be seen as an optimisation problem [15] where a cluster index (objective function) is optimised to obtain clustering solutions of high quality

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call