Abstract

BackgroundCloud computing for microbiome data sets can significantly increase working efficiencies and expedite the translation of research findings into clinical practice. The Amazon Web Services (AWS) cloud provides an invaluable option for microbiome data storage, computation, and analysis.ObjectiveThe goals of this study were to develop a microbiome data analysis pipeline by using AWS cloud and to conduct a proof-of-concept test for microbiome data storage, processing, and analysis.MethodsA multidisciplinary team was formed to develop and test a reproducible microbiome data analysis pipeline with multiple AWS cloud services that could be used for storage, computation, and data analysis. The microbiome data analysis pipeline developed in AWS was tested by using two data sets: 19 vaginal microbiome samples and 50 gut microbiome samples.ResultsUsing AWS features, we developed a microbiome data analysis pipeline that included Amazon Simple Storage Service for microbiome sequence storage, Linux Elastic Compute Cloud (EC2) instances (ie, servers) for data computation and analysis, and security keys to create and manage the use of encryption for the pipeline. Bioinformatics and statistical tools (ie, Quantitative Insights Into Microbial Ecology 2 and RStudio) were installed within the Linux EC2 instances to run microbiome statistical analysis. The microbiome data analysis pipeline was performed through command-line interfaces within the Linux operating system or in the Mac operating system. Using this new pipeline, we were able to successfully process and analyze 50 gut microbiome samples within 4 hours at a very low cost (a c4.4xlarge EC2 instance costs $0.80 per hour). Gut microbiome findings regarding diversity, taxonomy, and abundance analyses were easily shared within our research team.ConclusionsBuilding a microbiome data analysis pipeline with AWS cloud is feasible. This pipeline is highly reliable, computationally powerful, and cost effective. Our AWS-based microbiome analysis pipeline provides an efficient tool to conduct microbiome data analysis.

Highlights

  • Big data and data-driven analysis has become a primary driver of precision health [1,2]

  • The first was Amazon Elastic Block Store (EBS), which is closely integrated with our EC2 instance

  • The Amazon Web Services (AWS) EC2 provides virtual machines that are optimized for running central processing unit (CPU)-intensive cloud-based applications [28]

Read more

Summary

Introduction

Big data and data-driven analysis has become a primary driver of precision health [1,2]. Computation and analysis of big data sets in local infrastructures via traditional computational methods (eg, use of personal computers and local computational clusters) often requires prolonged run times, delaying further analytic work that needs to be performed and postponing the translation of research findings into clinical practice [12]. Another shortcoming of classical data analysis methods is the difficulty involved in sharing the data and findings among research collaborators. The Amazon Web Services (AWS) cloud provides an invaluable option for microbiome data storage, computation, and analysis

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.