Abstract

BackgroundAnalyzing gene expression data rigorously requires taking assumptions into consideration but also relies on using information about network relations that exist among genes. Combining these different elements cannot only improve statistical power, but also provide a better framework through which gene expression can be properly analyzed.Material and methodsWe propose a novel statistical model that combines assumptions and gene network information into the analysis. Assumptions are important since every test statistic is valid only when required assumptions hold. So, we propose hybrid p-values and show that, under the null hypothesis of primary interest, these p-values are uniformly distributed. These proposed hybrid p-values take assumptions into consideration. We incorporate gene network information into the analysis because neighboring genes share biological functions. This correlation factor is taken into account via similar prior probabilities for neighboring genes.ResultsWith a series of simulations our approach is compared with other approaches. Area Under the ROC Curves (AUCs) are constructed to compare the different methodologies; the AUC based on our methodology is larger than others. For regression analysis, AUC from our proposed method contains AUCs of Spearman test and of Pearson test. In addition, true negative rates (TNRs) also known as specificities are higher with our approach than with the other approaches. For two group comparison analysis, for instance, with a sample size of n=10, specificity corresponding to our proposed methodology is 0.716146 and specificities for t-test and rank sum are 0.689223 and 0.69797, respectively. Our method that combines assumptions and network information into the analysis is shown to be more powerful.ConclusionsThese proposed procedures are introduced as a general class of methods that can incorporate procedure-selection, account for multiple-testing, and incorporate graphical network information into the analysis. We obtain very good performance in simulations, and in real data analysis.

Highlights

  • Introduction xGene expression data can be analyzed in a multiple testing setting as well as many other statistical methods

  • Area Under the Receiver operating characteristic (ROC) Curves (AUCs) are constructed to compare the different methodologies; the Area under the ROC Curve (AUC) based on our methodology is larger than others

  • True negative rates (TNRs) known as specificities are higher with our approach than with the other approaches

Read more

Summary

Introduction

Introduction xGene expression data can be analyzed in a multiple testing setting as well as many other statistical methods. In addition to incorporating distributional assumptions into the overall testing, it may be informative to incorporate any prior knowledge of association between entities (Bowman and George 1995) Such associations are often recorded by graphical networks (Wei and Pan 2008). Combining these different elements, besides gaining statistical power, provides a framework through which analysis of gene expression data can be improved. We propose a novel statistical approach that incorporates testing for distributional assumption validity with prior information provided by gene graphical network. Analyzing gene expression data rigorously requires taking assumptions into consideration and relies on using information about network relations that exist among genes Combining these different elements cannot only improve statistical power, and provide a better framework through which gene expression can be properly analyzed

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.