Abstract

Background: Microarrays are biotechnological advancements measuring expressions of thousands of genes in a single assay. A two-group microarray study yields gene expression measurements for patients with a disease of interest and for healthy controls. Successful identification of genes differentiating between the two groups leads to new and improved treatments. While microarrays represent an exciting avenue for clinicians, the analysis of the large amount of data coming from these experiments comes with many challenges. The large number of features measured on a relatively small number of patients (the so-called p N problem), the small variability in some of the genes, and the correlations across genes are characteristics of microarray data and need to be addressed in analysis. Objective: Our objective is to apply the methods of microarray data analysis to kidney transplant patients. The two groups consist of patients experiencing a more severe type of rejection, T-Cell Mediated rejection (TCMR), versus patients experiencing a borderline rejection. Methods: We apply Significance Analysis of Microarray (SAM) to explore genes differentially expressed between the two groups. While SAM is a sound statistical method useful for exploring the data at the gene level, the output of thousands of significant genes is hard to interpret. There has been a shift of focus towards analysis at the gene set level. Biologists put together databases consisting of genes grouped by biological function, called biological pathways, or gene sets. The analysis at a gene set (pathway) level, called Gene Set Analysis (GSA), is easier to interpret, and more robust, in the sense that significant gene sets are more likely to be replicated across studies and microarray platforms. GSA addresses the p N problem via permutation tests. GSA methods can be broadly classified into self-contained methods, based on group labels permutations, and competitive methods based on subject permutations. We prefer the former, as it is preserves the correlations among genes in a pathway. We present the two top self-contained methods, called Significance Analysis of Microarrays for Gene Sets (SAM-GS) and Multivariate Analysis of Variance for Gene Sets (MANOVA-GSA), as they perform best according to previous simulation studies and real applications. We also present results of the most popular GSA, which is a hybrid between self-contained and competitive methods. False Discovery Rates are calculated to address multiple hypothesis testing. Results: Our data consists of expression measurements for 54,675 probes on 17 kidney transplant patients experiencing TCMR and 27 kidney transplant patients experiencing borderline rejection. The 54,675 expression measurements were reduced to 20,736 unique genes. For gene sets, we use the most recent version of the C2 catalogue consisting of 1892 gene sets, representing metabolic and signalling pathways from online pathway databases, gene sets from biomedical literature including 340 PubMed articles, and gene sets compiled from published mammalian microarray studies. We restricted the size of gene sets to be between 5 and 500, resulting in 1,839 gene sets used for our analysis. We found 957 significant genes with FDR values smaller than 5.71%. SAM-GS identified 58 pathways with p-value< 0.001 (FDRs < 1.8%). Among these, CDK5 and Interferon-gamma are only two examples of pathways previously established as associated with kidney transplant rejection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call