Abstract

Since their inception, several tools have been developed for cluster analysis and heatmap construction. The application of such tools to the number and types of genome-wide data available from next generation sequencing (NGS) technologies requires the adaptation of statistical concepts, such as in defining a most variable gene set, and more intricate cluster analyses method to address multiple omic data types. Additionally, the growing number of publicly available datasets has created the desire to estimate the statistical significance of a gene signature derived from one dataset to similarly group samples based on another dataset. The currently available number of tools and their combined use for generating heatmaps, along with the several adaptations of statistical concepts for addressing the higher dimensionality of genome-wide NGS-derived data, has created a further challenge in the ability to replicate heatmap results. We introduce NOJAH (NOt Just Another Heatmap), an interactive tool that defines and implements a workflow for genome-wide cluster analysis and heatmap construction by creating and combining several tools into a single user interface. NOJAH includes several newly developed scripts for techniques that though frequently applied are not sufficiently documented to allow for replicability of results. These techniques include: defining a most variable gene set (a.k.a., ‘core genes’), estimating the statistical significance of a gene signature to separate samples into clusters, and performing a result merging integrated cluster analysis. With only a user uploaded dataset, NOJAH provides as output, among other things, the minimum documentation required for replicating heatmap results. Additionally, NOJAH contains five different existing R packages that are connected in the interface by their functionality as part of a defined workflow for genome-wide cluster analysis. The NOJAH application tool is available at http://bbisr.shinyapps.winship.emory.edu/NOJAH/ http://shinygispa.winship.emory.edu/shinyGISPA/ with corresponding source code available at https://github.com/bbisr-shinyapps/NOJAH/.

Highlights

  • NOJAH is organized into three separate analysis workflows that correspond to the use cases highlighted in Fig 1: (1) genome-wide heatmap (GWH) analysis, (2) combined results cluster (CrC) analysis, and (3) gene set significance of cluster (SoC) analysis

  • Prognostic gene signatures within a cancer type constitute a set of genes whose expression changes reveal important information about tumor diagnosis, prognosis and even therapeutic response [6, 15]

  • A recently developed heatmap tool, shinyHeatmap [16], was unable to compute results for our CoMMpass RNA-Seq gene-level expression genome-wide dataset of 560 samples with 60,000 rows. Another commonly used heatmap tool, Morpheus, allows the user to filter genes based on numerous descriptive measures (e.g., VAR, median absolute deviation (MAD), maximum, minimum, mean) without providing much evidence as to which may be more appropriate for a dataset

Read more

Summary

Introduction

Data from generation sequencing (NGS) technologies have created a level of dimensionality that has greatly exceeded that of prior, microarray-based genome-wide datasets, resulting. The concept of defining a ‘most variable’ gene set was introduced to address the much higher dimension of NGS data by filtering out genes with little to no differences among samples with respect to some molecular data type and performing a cluster analysis on the remaining, ‘core gene set.’ This approach has resulted in the use of several definitions applied to define a core gene set, most of which are insufficiently documented to enable their replicability. A genome-wide variant heatmap creates a challenge due to the general sparseness of the data, if using somatic mutations For this reason, we have created an integrated most variable analysis approach to help address the sparseness often associated with variant data to enable the defining of a reduced-dimension, most variable gene set in this case.

Methods and implementation
Results
Discussion
Limitations
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call