Abstract
BackgroundLarge-scale accumulation of omics data poses a pressing challenge of integrative analysis of multiple data sets in bioinformatics. An open question of such integrative analysis is how to pinpoint consistent but subtle gene activity patterns across studies. Study heterogeneity needs to be addressed carefully for this goal.ResultsThis paper proposes a regulation probability model-based meta-analysis, jGRP, for identifying differentially expressed genes (DEGs). The method integrates multiple transcriptomics data sets in a gene regulatory space instead of in a gene expression space, which makes it easy to capture and manage data heterogeneity across studies from different laboratories or platforms. Specifically, we transform gene expression profiles into a united gene regulation profile across studies by mathematically defining two gene regulation events between two conditions and estimating their occurring probabilities in a sample. Finally, a novel differential expression statistic is established based on the gene regulation profiles, realizing accurate and flexible identification of DEGs in gene regulation space. We evaluated the proposed method on simulation data and real-world cancer datasets and showed the effectiveness and efficiency of jGRP in identifying DEGs identification in the context of meta-analysis.ConclusionsData heterogeneity largely influences the performance of meta-analysis of DEGs identification. Existing different meta-analysis methods were revealed to exhibit very different degrees of sensitivity to study heterogeneity. The proposed method, jGRP, can be a standalone tool due to its united framework and controllable way to deal with study heterogeneity.
Highlights
Large-scale accumulation of omics data poses a pressing challenge of integrative analysis of multiple data sets in bioinformatics
For simulation-II, we assumed differentially expressed genes (DEGs) to be differentially expressed in different directions in different studies and considered two groups of categories of differential expression: The first group has differential expression in all ten studies, which consists of three categories: 1) differentially expressed in the same direction in all ten studies; 2) differentially expressed in seven of ten studies in one direction but in the rest in the other direction; 3) differentially expressed in five of ten studies in one direction but in the rest in the other direction
The 3281 DEGs were further divided by jGRP into two categories with different regulatory directions: 1655 (Additional file 1: Table S1) were with a negative jGRP statistic meaning a down-regulation in lung adenocarcinoma (LUAD) tissues relative to normal tissues, and 1626 (Additional file 1: Table S2) with a positive jGRP statistic meaning an upregulation in LUAD
Summary
Large-scale accumulation of omics data poses a pressing challenge of integrative analysis of multiple data sets in bioinformatics. An open question of such integrative analysis is how to pinpoint consistent but subtle gene activity patterns across studies. Its extensive applications have been generating and accumulating a flood of omics data that bring unprecedented opportunity for elucidating cancer or other diseases at a molecular level [3,4,5,6]. Various types of omics data for nearly 10,000 tumor or normal samples have been released from the cancer genome atlas (TCGA) project. Meta-analysis of transcriptomic data needs to interrogate consistent but subtle gene activity patterns across studies. There exist three categories of meta-analysis methods used for DEGs identification: p-value-based, effect size-based and rank-based. These methods deal with nonspecific variations at different levels of data. The performance of the p-value methods is stringently conditional on the estimation model of p-values used in individual
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.