Abstract

Summary: We present the first public release of our proteogenomic annotation pipeline. We have previously used our original unreleased implementation to improve the annotation of 46 diverse prokaryotic genomes by discovering novel genes, post-translational modifications and correcting the erroneous annotations by analyzing proteomic mass-spectrometry data.This public version has been redesigned to run in a wide range of parallel Linux computing environments and provided with the automated configuration, build and testing facilities for easy deployment and portability.Availability and implementation: Source code is freely available from https://bitbucket.org/andreyto/proteogenomics under GPL license. It is implemented in Python and C++. It bundles the Makeflow engine to execute the workflows.Contact: atovtchi@jcvi.org

Highlights

  • Our pipeline is a tool for improving the existing genomic annotations from available proteomics mass spectrometry data

  • VICS has never been deployed outside of the J. Craig Venter Institute (JCVI), and the pipeline itself required manual configuration and building by the developers. It could only use Sun Grid Engine (SGE) batch queuing system configured for high-throughput computing (HTC) mode in which large numbers of serial jobs could be efficiently scheduled on a

  • High-throughput computing (HTC) clusters widely used as local bioinformatics computing resources

Read more

Summary

INTRODUCTION

Our pipeline is a tool for improving the existing genomic annotations from available proteomics mass spectrometry data. VICS has never been deployed outside of the JCVI, and the pipeline itself required manual configuration and building by the developers It could only use Sun Grid Engine (SGE) batch queuing system configured for high-throughput computing (HTC) mode in which large numbers of serial jobs could be efficiently scheduled on a. High-throughput computing (HTC) clusters widely used as local bioinformatics computing resources These clusters are configured to efficiently schedule large numbers of serial jobs under a control of batch queuing system. The volume of computations in proteogenomics is relatively high, with 100 CPU hours for a typical bacterial genome Our pipeline performs such annotation in 3 h of wall clock time on HTC cluster. One example is the Mycobacterium tuberculosis H37Rv genome (ftp://ftp.ncbi.nih.gov/genomes/ Bacteria/Mycobacterium_tuberculosis_H37Rv_uid57777/NC_ 000962.gbk) containing the CDS attributes/ experiment1⁄4‘‘EXISTENCE: identified in proteomics study’’

Parallelization strategy
Installation and execution
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.