Generation of ENSEMBL-based proteogenomics databases boosts the identification of non-canonical peptides.

Husen M Umer,Yafeng Zhu,Timo Sachsenberg,Enrique Audain,Julianus Pfeuffer,Janne Lehtiö,Rui M Branca,Yasset Perez-Riverol

doi:10.1093/bioinformatics/btab838

Abstract

SummaryWe have implemented the pypgatk package and the pgdb workflow to create proteogenomics databases based on ENSEMBL resources. The tools allow the generation of protein sequences from novel protein-coding transcripts by performing a three-frame translation of pseudogenes, lncRNAs and other non-canonical transcripts, such as those produced by alternative splicing events. It also includes exonic out-of-frame translation from otherwise canonical protein-coding mRNAs. Moreover, the tool enables the generation of variant protein sequences from multiple sources of genomic variants including COSMIC, cBioportal, gnomAD and mutations detected from sequencing of patient samples. pypgatk and pgdb provide multiple functionalities for database handling including optimized target/decoy generation by the algorithm DecoyPyrat. Finally, we have reanalyzed six public datasets in PRIDE by generating cell-type specific databases for 65 cell lines using the pypgatk and pgdb workflow, revealing a wealth of non-canonical or cryptic peptides amounting to >5% of the total number of peptides identified.Availability and implementationThe software is freely available. pypgatk: https://github.com/bigbio/py-pgatk/ and pgdb: https://nf-co.re/pgdb.Supplementary information Supplementary data are available at Bioinformatics online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Dec 14, 2021
Citations: 19	License type: CC BY-NC 4.0

R Discovery Prime

R Discovery Prime

Generation of ENSEMBL-based proteogenomics databases boosts the identification of non-canonical peptides.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Unbiased Mitoproteome Analyses Confirm Non-canonical RNA, Expanded Codon Translations
Hervé Seligmann
Computational and Structural Biotechnology Journal | VOL. 14
Hervé SeligmannHervé Seligmann
01 Jan 2015
Computational and Structural Biotechnology Journal | VOL. 14

MicroRNA and Alternative mRNA Splicing Events in Cancer Drug Response/Resistance: Potent Therapeutic Targets.
Rahaba Marima ... David Owen Bates
Biomedicines | VOL. 9
Rahaba Marima, et. al.Rahaba Marima ... David Owen Bates
02 Dec 2021
Biomedicines | VOL. 9

Alternative Splicing Events as Indicators for the Prognosis of Uveal Melanoma.
Qi Wan ... Lin Jin
Genes | VOL. 11
Qi Wan, et. al.Qi Wan ... Lin Jin
21 Feb 2020
Genes | VOL. 11

Fully sequencing the cassava full-length cDNA library reveals unannotated transcript structures and alternative splicing events in regions with a high density of single nucleotide variations, insertions-deletions, and heterozygous sequences.
Akihiro Ezoe ... Anh Thu Vu
Plant molecular biology | VOL. 112
Akihiro Ezoe, et. al.Akihiro Ezoe ... Anh Thu Vu
04 Apr 2023
Plant molecular biology | VOL. 112

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generation of ENSEMBL-based proteogenomics databases boosts the identification of non-canonical peptides.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics