Abstract
The field of cancer genomics has demonstrated the power of massively parallel sequencing techniques to inform on the genes and specific alterations that drive tumor onset and progression. Although large comprehensive sequence data sets continue to be made increasingly available, data analysis remains an ongoing challenge, particularly for laboratories lacking dedicated resources and bioinformatics expertise. To address this, we have produced a collection of Galaxy tools that represent many popular algorithms for detecting somatic genetic alterations from cancer genome and exome data. We developed new methods for parallelization of these tools within Galaxy to accelerate runtime and have demonstrated their usability and summarized their runtimes on multiple cloud service providers. Some tools represent extensions or refinement of existing toolkits to yield visualizations suited to cohort-wide cancer genomic analysis. For example, we present Oncocircos and Oncoprintplus, which generate data-rich summaries of exome-derived somatic mutation. Workflows that integrate these to achieve data integration and visualizations are demonstrated on a cohort of 96 diffuse large B-cell lymphomas and enabled the discovery of multiple candidate lymphoma-related genes. Our toolkit is available from our GitHub repository as Galaxy tool and dependency definitions and has been deployed using virtualization on multiple platforms including Docker.
Highlights
An inherent problem in the application of genomics to understand the molecular aetiology of cancer is the multi-disciplinary skillset required for researchers to draw meaningful inferences from high-throughput biological data
Algorithms for handling high-throughput sequence data are steadily being added to Galaxy, there currently remains a lack of tools and workflows tailored to perform common tasks involved in analyzing cancer genome and exome sequence data
We demonstrate the utility of the Galaxy Cancer Genomics Toolkit by applying the included workflows to a large cohort of Diffuse large B-cell lymphoma (DLBCL) patients (n = 96) and through a combination of analytical and exploratory approaches leveraging multiple visualization tools implemented within the Toolkit, we uncover new candidate lymphoma-related genes and putative genetic features associated with each molecular subgroup
Summary
An inherent problem in the application of genomics to understand the molecular aetiology of cancer is the multi-disciplinary skillset required for researchers to draw meaningful inferences from high-throughput biological data. Algorithms for handling high-throughput sequence data are steadily being added to Galaxy, there currently remains a lack of tools and workflows tailored to perform common tasks involved in analyzing cancer genome and exome sequence data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.