Abstract
BackgroundGrouping genes into clusters on the basis of similarity between their expression profiles has been the main approach to predict functional modules, from which important inference or further investigation decision could be made. While the univocal determination of similarity metric is important, current practices are normally involved with Euclidean distance and Pearson correlation, of which assumptions are not likely the case for high-throughput microarray data.ResultsWe advocate the use of a novel metric - BayesGen - to measure similarity between gene expression profiles, and demonstrate its performance on two important applications: constructing genome-wide co-expression network, and clustering cancer human tissues into subtypes. BayesGen is formulated as the evidence ratio between two alternative hypotheses about the generating mechanism of a given pair of genes, and incorporates as prior knowledge the global characteristics of the whole dataset. Through the joint modelling of expected intensity levels and noise variances, it addresses the inherent nonlinearity and the association of noise levels across different microarray value ranges. The full Bayesian formulation also facilitates the possibility of meta-analysis.ConclusionBayesGen allows more effective extraction of similarity information between genes from microarray expression data, which has significant effect on various inference tasks. It also provides a robust choice for other object-feature data, as illustrated through the results of the test on synthetic data.
Highlights
Grouping genes into clusters on the basis of similarity between their expression profiles has been the main approach to predict functional modules, from which important inference or further investigation decision could be made
We suggested the use of BayesGen - a new metric for measuring similarity/distance between gene expression profiles
Based on the observation that both data points' intensity levels and their relative variance jointly contribute to the identification of the underlying cellular processes, the metric was derived using a full Bayesian approach, which incorporates as prior knowledge the global characteristics of the whole dataset
Summary
Grouping genes into clusters on the basis of similarity between their expression profiles has been the main approach to predict functional modules, from which important inference or further investigation decision could be made. While the univocal determination of similarity metric is important, current practices are normally involved with Euclidean distance and Pearson correlation, of which assumptions are not likely the case for high-throughput microarray data. With the development of high-throughput experimental techniques, biological research have been transformed into a data-rich discipline. The analysis of microarray data is normally based on the reasoning that variations in gene expression patterns under different experimental conditions are the (page number not for citation purposes). By grouping genes with similar expression profiles into clusters, researchers have been able to make meaningful inference of regulatory modules and functional pathways [2]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.