Abstract

With combined technological advancements in high-throughput next-generation sequencing and deep mass spectrometry-based proteomics, proteogenomics, i.e. the integrative analysis of proteomic and genomic data, has emerged as a new research field. Early efforts in the field were focused on improving protein identification using sample-specific genomic and transcriptomic sequencing data. More recently, integrative analysis of quantitative measurements from genomic and proteomic studies have identified novel insights into gene expression regulation, cell signaling, and disease. Many methods and tools have been developed or adapted to enable an array of integrative proteogenomic approaches and in this article, we systematically classify published methods and tools into four major categories, (1) Sequence-centric proteogenomics; (2) Analysis of proteogenomic relationships; (3) Integrative modeling of proteogenomic data; and (4) Data sharing and visualization. We provide a comprehensive review of methods and available tools in each category and highlight their typical applications.

Highlights

  • The last decade has witnessed the rapid emergence of proteogenomics, a new research field at the interface of genomics and proteomics

  • A more systematic study using messenger RNAs (mRNAs) and protein profiling data from the three CPTAC cancer types showed that proteomic data strengthened the link between gene expression and function for at least 75% of Gene Ontology (GO) biological processes and 90% of KEGG pathways [104]

  • Rapid technological developments occurring over the last decade have made it possible to generate large proteogenomic data sets, thereby driving the development of new methods for proteogenomic data analysis

Read more

Summary

Wild Type

Personalized Protein Sequence Databases—Reference protein sequence databases, such as those from Ensembl or RefSeq, are typically used to identify mass spectra through peptide spectrum matching Because these databases lack sample specific sequence variation, including single amino acid variants (SAAVs), insertions, deletions, alternative splice junctions and novel gene fusions, studies using this approach are unable to identify the corresponding variant peptides present in the MS/MS data. This is a important limitation to consider in cancer studies, where patients acquire tumor specific somatic variation.

MSProGene PPLine
ANALYSIS OF PROTEOGENOMIC RELATIONSHIPS
PhosphoVariant NA
INTEGRATIVE MODELING OF PROTEOGENOMIC DATA
DATA SHARING AND VISUALIZATION
TABLE III Computational resources for pathway and gene ontology enrichment
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call