Abstract
BackgroundHarmonization techniques make different gene expression profiles and their sets compatible and ready for comparisons. Here we present a new bioinformatic tool termed Shambhala for harmonization of multiple human gene expression datasets obtained using different experimental methods and platforms of microarray hybridization and RNA sequencing.ResultsUnlike previously published methods enabling good quality data harmonization for only two datasets, Shambhala allows conversion of multiple datasets into the universal form suitable for further comparisons. Shambhala harmonization is based on the calibration of gene expression profiles using the auxiliary standardization dataset. Each profile is transformed to make it similar to the output of microarray hybridization platform Affymetrix Human Gene. This platform was chosen because it has the biggest number of human gene expression profiles deposited in public databases. We evaluated Shambhala ability to retain biologically important features after harmonization. The same four biological samples taken in multiple replicates were profiled independently using three and four different experimental platforms, respectively, then Shambhala-harmonized and investigated by hierarchical clustering.ConclusionOur results showed that unlike other frequently used methods: quantile normalization and DESeq/DESeq2 normalization, Shambhala harmonization was the only method supporting sample-specific and platform-independent biologically meaningful clustering for the data obtained from multiple experimental platforms.
Highlights
Harmonization techniques make different gene expression profiles and their sets compatible and ready for comparisons
In its present form, the method was tailored for the comparison of human gene expression data, and its application for other organism data requires further specific data search
When selecting the optimal auxiliary calibration dataset (P0) for Shambhala implementation, we found that our previous experimental dataset including 39 human gene expression profiles obtained using CustomArray microchip platform (CustomArray, USA) showed the best performance in clustering tests compared to more than twenty other datasets of the comparable size
Summary
Harmonization techniques make different gene expression profiles and their sets compatible and ready for comparisons. The most popular repositories such as Gene Expression Omnibus (GEO) [3] and Array-Express [4] accumulate data for more than 2 million of individual expression profiles in more than 70,000 series of experiments These transcriptional profiles were generally obtained using different experimental modifications of microarray hybridization and RNA sequencing. This non-comparability of gene expression data hampers further levels of data analysis for the different datasets, e.g. finding differentially expressed genes and assessing activation of molecular pathways [11, 12] To solve this problem, the data must be either normalized (when datasets under comparison were obtained using one experimental platform) or harmonized (when different platforms were used) [12]. For most cases of the harmonization, there is a need to reshape distributions for the entire gene expression profiles
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.