Abstract

Multi-omics datasets represent distinct aspects of the central dogma of molecular biology. Such high-dimensional molecular profiles pose challenges to data interpretation and hypothesis generation. ActivePathways is an integrative method that discovers significantly enriched pathways across multiple datasets using statistical data fusion, rationalizes contributing evidence and highlights associated genes. As part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumor types, we integrated genes with coding and non-coding mutations and revealed frequently mutated pathways and additional cancer genes with infrequent mutations. We also analyzed prognostic molecular pathways by integrating genomic and transcriptomic features of 1780 breast cancers and highlighted associations with immune response and anti-apoptotic signaling. Integration of ChIP-seq and RNA-seq data for master regulators of the Hippo pathway across normal human tissues identified processes of tissue regeneration and stem cell regulation. ActivePathways is a versatile method that improves systems-level understanding of cellular organization in health and disease through integration of multiple molecular datasets and pathway annotations.

Highlights

  • Multi-omics datasets represent distinct aspects of the central dogma of molecular biology

  • The Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium aggregated whole genome sequencing data from 2658 cancers across 38 tumor types generated by the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) projects

  • ActivePathways is a versatile method for combining diverse multiomics datasets

Read more

Summary

Introduction

Multi-omics datasets represent distinct aspects of the central dogma of molecular biology. As part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumor types, we integrated genes with coding and non-coding mutations and revealed frequently mutated pathways and additional cancer genes with infrequent mutations. The PCAWG Consortium aggregated whole genome sequencing data from 2658 cancers across 38 tumor types generated by the ICGC and TCGA projects These sequencing data were re-analysed with standardized, high-accuracy pipelines to align to the human genome (reference build hs37d5) and identify germline variants and somatically acquired mutations, as described in the PCAWG marker paper[4]. We report the development of the ActivePathways method that uses data fusion techniques to address the challenge of integrative pathway analysis of multi-omics data It detects significantly enriched pathways across multiple datasets, including those pathways that are not apparent in any individual dataset. ActivePathways is a versatile method for combining diverse multiomics datasets

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call