Abstract

BackgroundHigh-throughput microarray technologies have generated and accumulated massive amounts of gene expression datasets that contain expression levels of thousands of genes under hundreds of different experimental conditions. The microarray datasets are usually presented in 2D matrices, where rows represent genes and columns represent experimental conditions. The analysis of such datasets can discover local structures composed by sets of genes that show coherent expression patterns under subsets of experimental conditions. It leads to the development of sophisticated algorithms capable of extracting novel and useful knowledge from a biomedical point of view. In the medical domain, these patterns are useful for understanding various diseases, and aid in more accurate diagnosis, prognosis, treatment planning, as well as drug discovery.ResultsIn this work we present the CMOPSOB (Crowding distance based Multi-objective Particle Swarm Optimization Biclustering), a novel clustering approach for microarray datasets to cluster genes and conditions highly related in sub-portions of the microarray data. The objective of biclustering is to find sub-matrices, i.e. maximal subgroups of genes and subgroups of conditions where the genes exhibit highly correlated activities over a subset of conditions. Since these objectives are mutually conflicting, they become suitable candidates for multi-objective modelling. Our approach CMOPSOB is based on a heuristic search technique, multi-objective particle swarm optimization, which simulates the movements of a flock of birds which aim to find food. In the meantime, the nearest neighbour search strategies based on crowding distance and ϵ-dominance can rapidly converge to the Pareto front and guarantee diversity of solutions. We compare the potential of this methodology with other biclustering algorithms by analyzing two common and public datasets of gene expression profiles. In all cases our method can find localized structures related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The mined patterns present a significant biological relevance in terms of related biological processes, components and molecular functions in a species-independent manner.ConclusionThe proposed CMOPSOB algorithm is successfully applied to biclustering of microarray dataset. It achieves a good diversity in the obtained Pareto front, and rapid convergence. Therefore, it is a useful tool to analyze large microarray datasets.

Highlights

  • High-throughput microarray technologies have generated and accumulated massive amounts of gene expression datasets that contain expression levels of thousands of genes under hundreds of different experimental conditions

  • The proposed CMOPSOB algorithm is successfully applied to biclustering of microarray dataset

  • We have provided a novel multi-objective particle swarm optimization (PSO) framework for mining biclusters from microarray datasets

Read more

Summary

Introduction

High-throughput microarray technologies have generated and accumulated massive amounts of gene expression datasets that contain expression levels of thousands of genes under hundreds of different experimental conditions. The microarray datasets are usually presented in 2D matrices, where rows represent genes and columns represent experimental conditions The analysis of such datasets can discover local structures composed by sets of genes that show coherent expression patterns under subsets of experimental conditions. With the advent of DNA microarray technology, it is allowable for simultaneously measuring the expression level of thousands of genes under different conditions in a single experiment In this way the scientific community can collect a huge amount of gene expression datasets. Microarray technique has been widely used in several contexts such as tumour profiling, drug discovery and temporal analysis of cell behaviour [1,2] Applications of these microarray data contain the study of gene expression in yeast under different environmental stress conditions and the comparisons of gene expression profiles for tumour from cancer patients. By comparing gene expression in normal and disease sells, microarrays may be used to identify disease genes and targets for therapeutic drugs

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call