Abstract

The recently introduced Kallisto pseudoaligner has radically simplified the quantification of transcripts in RNA-sequencing experiments. We offer cloud-scale RNAseq pipelines Arkas-Quantification, and Arkas-Analysis available within Illumina's BaseSpace cloud application platform which expedites Kallisto preparatory routines, reliably calculates differential expression, and performs gene-set enrichment of REACTOME pathways . Due to inherit inefficiencies of scale, Illumina's BaseSpace computing platform offers a massively parallel distributive environment improving data management services and data importing. Arkas-Quantification deploys Kallisto for parallel cloud computations and is conveniently integrated downstream from the BaseSpace Sequence Read Archive (SRA) import/conversion application titled SRA Import. Arkas-Analysis annotates the Kallisto results by extracting structured information directly from source FASTA files with per-contig metadata, calculates the differential expression and gene-set enrichment analysis on both coding genes and transcripts. The Arkas cloud pipeline supports ENSEMBL transcriptomes and can be used downstream from the SRA Import facilitating raw sequencing importing, SRA FASTQ conversion, RNA quantification and analysis steps.

Highlights

  • High-performance computing based bioinformatic workflows have three main subfamilies: in-house computational packages, virtualmachines (VMs), and cloud based computational environments

  • The user or developer cedes some control of the platform and interface, in exchange for the platform provider handling the details of workflow distribution and execution

  • AceView, UCSC, RefSeq, and GENCODE have approximately twenty thousand protein coding genes, AceView and GENCODE have a greater number of protein coding transcripts in their databases

Read more

Summary

Introduction

High-performance computing based bioinformatic workflows have three main subfamilies: in-house computational packages, virtualmachines (VMs), and cloud based computational environments. Platform-as-a-service approaches take this one step further, offering controlled deployment and fault tolerance across potentially unreliable instances provided by third parties such as Amazon Web Service Elastic Compute Cloud (AWS EC2) and enforcing a standard for encapsulation of developers’ services such as Docker. Within this framework, the user or developer cedes some control of the platform and interface, in exchange for the platform provider handling the details of workflow distribution and execution. Combined with versioning of deployments, it is feasible for users to reconstruct results from an earlier point in time, while simultaneously re-evaluating the generated data under state-of-the-art implementations

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call