Abstract

Contextual functional interpretation of -omics data derived from clinical samples is a classical and difficult problem in computational systems biology. The measurement of thousands of data points on single samples has become routine but relating 'big data' datasets to the complexities of human pathobiology is an area of ongoing research. Complicating this is the fact that many publicly available datasets use bulk transcriptomics data from complex tissues like blood. The most prevalent analytic approaches derive molecular 'signatures' of disease states or apply modular analysis frameworks to the data. Here we describe ANIMA (association network integration for multiscale analysis), a network-based data integration method using clinical phenotype and microarray data as inputs. ANIMA is implemented in R and Neo4j and runs in Docker containers. In short, the build algorithm iterates over one or more transcriptomics datasets to generate a large, multipartite association network by executing multiple independent analytic steps (differential expression, deconvolution, modular analysis based on co-expression, pathway analysis) and integrating the results. Once the network is built, it can be queried directly using Cypher (a graph query language), or by custom functions that communicate with the graph database via language-specific APIs. We developed a web application using Shiny, which provides fully interactive, multiscale views of the data. Using our approach, we show that we can reconstruct multiple features of disease states at various scales of organization, from transcript abundance patterns of individual genes through co-expression patterns of groups of genes to patterns of cellular behaviour in whole blood samples, both in single experiments as well in meta-analyses of multiple datasets.

Highlights

  • A frequent issue with bioinformatic analysis is the following scenario: a given dataset is analysed using various approaches in a linear workflow, in an attempt to extract biological features of interest from the data, often at different scales

  • Where the datasets contained samples from multiple timepoints, we restricted the analysis to healthy controls and the first disease timepoint

  • In addition to the standard web browser interface for Neo4j, into which the above query can be directly entered and returned in the browser window (Figure 2A), we provide a function in R that returns the network found, and plots this within R (Figure 2B), exports the network as node and edge lists for import in other software like Cytoscape (Figure 2C), or returns the result as an igraph[23] object for further manipulation within R. (A file ANIMA_styles.xml is included in the common folder supplied with the source code; this is a Cytoscape stylesheet that reproduces the colouring shown in the Figure 2C)

Read more

Summary

Introduction

A frequent issue with bioinformatic analysis is the following scenario: a given dataset is analysed using various approaches in a linear workflow, in an attempt to extract biological features of interest from the data, often at different scales. Providing the code used in analysis and the raw data does not guarantee reproducibility, as the computational environment in which the analysis is run can influence the outcome of computations. This becomes an issue when the functionality of a software package changes between versions. In addition to data and source code, one has to provide the exact configuration and package versions of all software involved in the project. The containerization platform Docker[4] is frequently used to fulfil this function and has enjoyed widespread adoption in reproducible research This is exemplified by Nextflow[5], a workflow management system using Docker

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.