Abstract

The field of cancer genomics has demonstrated the power of massively parallel sequencing techniques to inform on the genes and specific alterations that drive tumor onset and progression. Although large comprehensive sequence data sets continue to be made increasingly available, data analysis remains an ongoing challenge, particularly for laboratories lacking dedicated resources and bioinformatics expertise. To address this, we have produced a collection of Galaxy tools that represent many popular algorithms for detecting somatic genetic alterations from cancer genome and exome data. We developed new methods for parallelization of these tools within Galaxy to accelerate runtime and have demonstrated their usability and summarized their runtimes on multiple cloud service providers. Some tools represent extensions or refinement of existing toolkits to yield visualizations suited to cohort-wide cancer genomic analysis. For example, we present Oncocircos and Oncoprintplus, which generate data-rich summaries of exome-derived somatic mutation. Workflows that integrate these to achieve data integration and visualizations are demonstrated on a cohort of 96 diffuse large B-cell lymphomas and enabled the discovery of multiple candidate lymphoma-related genes. Our toolkit is available from our GitHub repository as Galaxy tool and dependency definitions and has been deployed using virtualization on multiple platforms including Docker.

Highlights

  • An inherent problem in the application of genomics to understand the molecular aetiology of cancer is the multi-disciplinary skillset required for researchers to draw meaningful inferences from high-throughput biological data

  • Algorithms for handling high-throughput sequence data are steadily being added to Galaxy, there currently remains a lack of tools and workflows tailored to perform common tasks involved in analyzing cancer genome and exome sequence data

  • We demonstrate the utility of the Galaxy Cancer Genomics Toolkit by applying the included workflows to a large cohort of Diffuse large B-cell lymphoma (DLBCL) patients (n = 96) and through a combination of analytical and exploratory approaches leveraging multiple visualization tools implemented within the Toolkit, we uncover new candidate lymphoma-related genes and putative genetic features associated with each molecular subgroup

Read more

Summary

Introduction

An inherent problem in the application of genomics to understand the molecular aetiology of cancer is the multi-disciplinary skillset required for researchers to draw meaningful inferences from high-throughput biological data. Algorithms for handling high-throughput sequence data are steadily being added to Galaxy, there currently remains a lack of tools and workflows tailored to perform common tasks involved in analyzing cancer genome and exome sequence data.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call