PGP: parallel prokaryotic proteogenomics pipeline for MPI clusters, high-throughput batch clusters and multicore workstations

Andrey Tovchigrechko,Samuel H Payne,Pratap Venepally

doi:10.1093/bioinformatics/btu051

Andrey Tovchigrechko, Samuel H Payne + Show 1 more

Open Access

https://doi.org/10.1093/bioinformatics/btu051

Copy DOI

Journal: Bioinformatics	Publication Date: Jan 27, 2014
Citations: 26	License type: CC BY 3.0

Affiliation: Pacific Northwest National Laboratory

Abstract

Summary: We present the first public release of our proteogenomic annotation pipeline. We have previously used our original unreleased implementation to improve the annotation of 46 diverse prokaryotic genomes by discovering novel genes, post-translational modifications and correcting the erroneous annotations by analyzing proteomic mass-spectrometry data.This public version has been redesigned to run in a wide range of parallel Linux computing environments and provided with the automated configuration, build and testing facilities for easy deployment and portability.Availability and implementation: Source code is freely available from https://bitbucket.org/andreyto/proteogenomics under GPL license. It is implemented in Python and C++. It bundles the Makeflow engine to execute the workflows.Contact: atovtchi@jcvi.org

Highlights

Our pipeline is a tool for improving the existing genomic annotations from available proteomics mass spectrometry data
VICS has never been deployed outside of the J. Craig Venter Institute (JCVI), and the pipeline itself required manual configuration and building by the developers. It could only use Sun Grid Engine (SGE) batch queuing system configured for high-throughput computing (HTC) mode in which large numbers of serial jobs could be efficiently scheduled on a
High-throughput computing (HTC) clusters widely used as local bioinformatics computing resources

Summary

INTRODUCTION

Our pipeline is a tool for improving the existing genomic annotations from available proteomics mass spectrometry data. VICS has never been deployed outside of the JCVI, and the pipeline itself required manual configuration and building by the developers It could only use Sun Grid Engine (SGE) batch queuing system configured for high-throughput computing (HTC) mode in which large numbers of serial jobs could be efficiently scheduled on a. High-throughput computing (HTC) clusters widely used as local bioinformatics computing resources These clusters are configured to efficiently schedule large numbers of serial jobs under a control of batch queuing system. The volume of computations in proteogenomics is relatively high, with 100 CPU hours for a typical bacterial genome Our pipeline performs such annotation in 3 h of wall clock time on HTC cluster. One example is the Mycobacterium tuberculosis H37Rv genome (ftp://ftp.ncbi.nih.gov/genomes/ Bacteria/Mycobacterium_tuberculosis_H37Rv_uid57777/NC_ 000962.gbk) containing the CDS attributes/ experiment1⁄4‘‘EXISTENCE: identified in proteomics study’’

Parallelization strategy

Installation and execution

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

PGP: parallel prokaryotic proteogenomics pipeline for MPI clusters, high-throughput batch clusters and multicore workstations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

SQuAPP—simple quantitative analysis of proteins and PTMs
Enes K Ergin ... Siyuan Chen
Bioinformatics | VOL. 38
Enes K Ergin, et. al.Enes K Ergin ... Siyuan Chen
14 Sep 2022
Bioinformatics | VOL. 38

New Data Analysis and Mining Approaches Identify Unique Proteome and Transcriptome Markers of Susceptibility to Autoimmune Diabetes
Ivan C Gerling ... Sudhir Singh
Molecular & Cellular Proteomics | VOL. 5
Ivan C Gerling, et. al.Ivan C Gerling ... Sudhir Singh
16 Oct 2005
Molecular & Cellular Proteomics | VOL. 5

Proteome, Phosphoproteome, and Hydroxyproteome of Liver Mitochondria in Diabetic Rats at Early Pathogenic Stages
Wen-Jun Deng ... Rong Zeng
Molecular & Cellular Proteomics | VOL. 9
Wen-Jun Deng, et. al.Wen-Jun Deng ... Rong Zeng
01 Jan 2009
Molecular & Cellular Proteomics | VOL. 9

Post-translational modifications of pancreatic fluid proteins collected via the endoscopic pancreatic function test (ePFT)
Joao A Paulo ... Scott Brizard
Journal of Proteomics | VOL. 92
Joao A Paulo, et. al.Joao A Paulo ... Scott Brizard
14 Mar 2013
Journal of Proteomics | VOL. 92

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PGP: parallel prokaryotic proteogenomics pipeline for MPI clusters, high-throughput batch clusters and multicore workstations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics