Abstract

SummaryComplex human diseases can show significant heterogeneity between patients with the same phenotypic disorder. An outlier detection strategy was developed to identify variants at the level of gene transcription that are of potential biological and phenotypic importance. Here we describe a graphical software package (z-score outlier detection (ZODET)) that enables identification and visualisation of gross abnormalities in gene expression (outliers) in individuals, using whole genome microarray data. Mean and standard deviation of expression in a healthy control cohort is used to detect both over and under-expressed probes in individual test subjects. We compared the potential of ZODET to detect outlier genes in gene expression datasets with a previously described statistical method, gene tissue index (GTI), using a simulated expression dataset and a publicly available monocyte-derived macrophage microarray dataset. Taken together, these results support ZODET as a novel approach to identify outlier genes of potential pathogenic relevance in complex human diseases. The algorithm is implemented using R packages and Java.AvailabilityThe software is freely available from http://www.ucl.ac.uk/medicine/molecular-medicine/publications/microarray-outlier-analysis.

Highlights

  • Many human diseases, such as inflammatory bowel disease and type 1 diabetes, are complex, multifactorial syndromes with genetic and environmental determinants

  • It was shown that the gene tissue index (GTI) method performs comparably to other outlier detection methods (including: cancer outlier profile analysis (COPA); outlier sums (OS); and the outlier robust t-statistic (ORT)) when using a simulated expression dataset [5,6,12]

  • In order to benchmark our z-score based outlier detection method z-score outlier detection (ZODET) against these existing methods we performed a similar comparison to the GTI method, using both simulated expression datasets and an expression dataset compiled from a cohort of human monocyte-derived macrophage samples split into two equal sized groups

Read more

Summary

Introduction

Many human diseases, such as inflammatory bowel disease and type 1 diabetes, are complex, multifactorial syndromes with genetic and environmental determinants. It has been postulated that rare or low frequency variants, structural rearrangements such as deletions, insertions, translocations, and epigenetic variation could be important in the pathogenesis of these complex disorders and account for the observed heterogeneity [1] All of these are incompletely assessed by current genome wide association studies (GWAS). A recent paper combined population-scale human genomic sequence data with transcriptomic data and identified an enrichment of rare variants associated with outlier gene expression [2]. They concluded that across multiple tissues and developmental stages, an individual would be expected to have hundreds of rare variants with large effects on gene expression. The examination of significantly overexpressed genes in individual patients (or subgroups of patients) has successfully been employed in the field of cancer genomics [3]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call