Scalable Pathogen Pipeline Platform (SP^3): Enabling Unified Genomic Data Analysis with Elastic Cloud Computing

Fan Yang-Turner,Derrick Crook,Jeremy Swann,Philip Fowler,Matthew Bull,Tim Peto,Denis Volk,Thomas Connor,Sarah Hoosdally

doi:10.1109/cloud.2019.00083

Fan Yang-Turner, Derrick Crook + Show 7 more

Open Access

PDF Available

https://doi.org/10.1109/cloud.2019.00083

Copy DOI

Export

Save

Cite

Publication Date: Jul 1, 2019
Citations: 1	License type: other-oa

Affiliation: University of Oxford, Cardiff University

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Pathogen genomic data analysis can be extremely bespoke and diverse. This paper presents our plan and progress towards creating a Scalable Pathogen Pipeline Platform (SP^3) providing an efficient and unified process of collecting, analysing and comparing genomic data analysis with the benefit of elastic cloud computing. SP^3 enables container-centric bioinformatic workflows run on personal computers, High-performance computing (HPC) clusters and cloud platforms. We have deployed and tested SP^3 on local HPC, Google Cloud Platform (GCP), Microsoft Azure and OpenStack Platforms. SP^3 allows users to fetch genomic sequencing data from European Nucleotide Archive (ENA) and conduct analysis with open-source bioinformatic pipelines. We believe SP^3 will promote common standards around pathogen genomic data quality, data processing and data analysis, helping answer the challenges of tools divergence and leveraging a pool of public genomic data repository and cloud resources.

Full Text