Detecting discordance enrichment among a series of two-sample genome-wide expression data sets

Yinglei Lai,Reza Modarres,Fanni Zhang,Tapan K Nayak,Norman H Lee,Timothy A Mccaffrey

doi:10.1186/s12864-016-3265-2

Yinglei Lai, Reza Modarres + Show 4 more

Open Access

https://doi.org/10.1186/s12864-016-3265-2

Copy DOI

Abstract

BackgroundWith the current microarray and RNA-seq technologies, two-sample genome-wide expression data have been widely collected in biological and medical studies. The related differential expression analysis and gene set enrichment analysis have been frequently conducted. Integrative analysis can be conducted when multiple data sets are available. In practice, discordant molecular behaviors among a series of data sets can be of biological and clinical interest.MethodsIn this study, a statistical method is proposed for detecting discordance gene set enrichment. Our method is based on a two-level multivariate normal mixture model. It is statistically efficient with linearly increased parameter space when the number of data sets is increased. The model-based probability of discordance enrichment can be calculated for gene set detection.ResultsWe apply our method to a microarray expression data set collected from forty-five matched tumor/non-tumor pairs of tissues for studying pancreatic cancer. We divided the data set into a series of non-overlapping subsets according to the tumor/non-tumor paired expression ratio of gene PNLIP (pancreatic lipase, recently shown it association with pancreatic cancer). The log-ratio ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). Our purpose is to understand whether any gene sets are enriched in discordant behaviors among these subsets (when the log-ratio is increased from negative to positive). We focus on KEGG pathways. The detected pathways will be useful for our further understanding of the role of gene PNLIP in pancreatic cancer research. Among the top list of detected pathways, the neuroactive ligand receptor interaction and olfactory transduction pathways are the most significant two. Then, we consider gene TP53 that is well-known for its role as tumor suppressor in cancer research. The log-ratio also ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). We divided the microarray data set again according to the expression ratio of gene TP53. After the discordance enrichment analysis, we observed overall similar results and the above two pathways are still the most significant detections. More interestingly, only these two pathways have been identified for their association with pancreatic cancer in a pathway analysis of genome-wide association study (GWAS) data.ConclusionsThis study illustrates that some disease-related pathways can be enriched in discordant molecular behaviors when an important disease-related gene changes its expression. Our proposed statistical method is useful in the detection of these pathways. Furthermore, our method can also be applied to genome-wide expression data collected by the recent RNA-seq technology.

Highlights

With the current microarray and RNA sequencing (RNA-seq) technologies, two-sample genome-wide expression data have been widely collected in biological and medical studies
The detected pathways will be useful for our further understanding of the role of gene PNLIP in pancreatic cancer research
As we have explained in the Methods, we expect to identify pathways with enrichment in clearly discordant gene behaviors among a series of pre-defined genome-wide expression data sets. (Notice that a pathway with discordance enrichment score (DES) ∼ 1 is significantly enriched in clearly discordant behaviors; and a pathway with DES ∼ 0 is not enriched in clearly discordant behaviors)

Summary

Introduction

With the current microarray and RNA-seq technologies, two-sample genome-wide expression data have been widely collected in biological and medical studies. Integrative analysis can be conducted when multiple data sets are available. Genome-wide expression data have been widely collected by the recent microarray [1,2,3] or RNA-seq technologies [4, 5]. It enables us to detect weak but coherent changes in individual genes through aggregating information from a specific group of genes. In the current public databases, large genome-wide expression data sets or multiple genome-wide expression data sets have been made available [3, 9]. Integrative analysis enables us to detect weak but coherent changes in individual datasets through aggregating information from different datasets [10,11,12]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Jan 1, 2017
Citations: 5	License type: open-access

R Discovery Prime

R Discovery Prime

Detecting discordance enrichment among a series of two-sample genome-wide expression data sets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Insights into Pancreatic Cancer Etiology from Pathway Analysis of Genome-Wide Association Study Data
Peng Wei ... Donghui Li
PLoS ONE | VOL. 7
Peng Wei, et. al.Peng Wei ... Donghui Li
04 Oct 2012
PLoS ONE | VOL. 7

Head and Neck Squamous Cell Carcinoma Subtypes Based on Immunologic and Hallmark Gene Sets in Tumor and Non-tumor Tissues.
Ji Yin ... Lanxin Hu
Frontiers in surgery | VOL. 9
Ji Yin, et. al.Ji Yin ... Lanxin Hu
03 Feb 2022
Frontiers in surgery | VOL. 9

Abstract 475: Clinical significance of eif5-mimic protein 1 in pancreatic cancer
Yushi Motomura ... Kuniaki Sato
Cancer Research | VOL. 79
Yushi Motomura, et. al.Yushi Motomura ... Kuniaki Sato
01 Jul 2019
Abstract 475: Clinical significance of eif5-mimic protein 1 in pancreatic cancer
Yushi Motomura ... Kuniaki Sato

Abstract A022: Neoadjuvant ablative radiation downstages high-risk features and alters immune response in pancreatic cancer
Peter Q Leung ... Megan Wachsmann
Cancer Research | VOL. 84
Peter Q Leung, et. al.Peter Q Leung ... Megan Wachsmann
15 Sep 2024
Cancer Research | VOL. 84

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Detecting discordance enrichment among a series of two-sample genome-wide expression data sets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics