Abstract

Gene expression data can provide a very rich source of information for elucidating the biological function on the pathway level if the experimental design considers the needs of the statistical analysis methods. The purpose of this paper is to provide a comparative analysis of statistical methods for detecting the differentially expression of pathways (DEP). In contrast to many other studies conducted so far, we use three novel simulation types, producing a more realistic correlation structure than previous simulation methods. This includes also the generation of surrogate data from two large-scale microarray experiments from prostate cancer and ALL. As a result from our comprehensive analysis of parameter configurations, we find that each method should only be applied if certain conditions of the data from a pathway are met. Further, we provide method-specific estimates for the optimal sample size for microarray experiments aiming to identify DEP in order to avoid an underpowered design. Our study highlights the sensitivity of the studied methods on the parameters of the system.

Highlights

  • The functional analysis of high-throughput data is a challenging but promising direction in the post-genomics era

  • For the transcriptional regulatory network we select 200 different but overlapping pathways that consist in total of p~1199 gene for algorithm (1) and (2). Parameters studied for both algorithms: We study the influence of the sample size (n[N ~f5,10,15, . . . ,45g, DN D~9) and the detection call (DC[DC~f0%,10%,30%,60%g, DDCD~4)

  • Most other microarray experiments conducted provide usually less than 50 samples per condition making our choice from a biological point of view reasonable

Read more

Summary

Introduction

The functional analysis of high-throughput data is a challenging but promising direction in the post-genomics era. In the context of expression data the interest shifted in recent years from approaches focusing on the analysis of individual genes, detecting their differentially expression [5,6,7], toward the analysis of gene sets in order to identify differentially expressed sets of genes [8,9,10,11]. The rational behind this is that genes and their products do not work in isolation but interact with each other in a concerted manner in order for a phenotype to emerge [12]. A hypothesis test comparing a gene set to all other gene sets available is called competitive, whereas a test comparing the same gene set for two different phenotypes is called self-contained

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.