Abstract

BackgroundProteogenomics integrates genomics, transcriptomics, and mass spectrometry (MS)-based proteomics data to identify novel protein sequences arising from gene and transcript sequence variants. Proteogenomic data analysis requires integration of disparate ‘omic software tools, as well as customized tools to view and interpret results. The flexible Galaxy platform has proven valuable for proteogenomic data analysis. Here, we describe a novel Multi-omics Visualization Platform (MVP) for organizing, visualizing, and exploring proteogenomic results, adding a critically needed tool for data exploration and interpretation.FindingsMVP is built as an HTML Galaxy plug-in, primarily based on JavaScript. Via the Galaxy API, MVP uses SQLite databases as input—a custom data type (mzSQLite) containing MS-based peptide identification information, a variant annotation table, and a coding sequence table. Users can interactively filter identified peptides based on sequence and data quality metrics, view annotated peptide MS data, and visualize protein-level information, along with genomic coordinates. Peptides that pass the user-defined thresholds can be sent back to Galaxy via the API for further analysis; processed data and visualizations can also be saved and shared. MVP leverages the Integrated Genomics Viewer JavaScript framework, enabling interactive visualization of peptides and corresponding transcript and genomic coding information within the MVP interface.ConclusionsMVP provides a powerful, extensible platform for automated, interactive visualization of proteogenomic results within the Galaxy environment, adding a unique and critically needed tool for empowering exploration and interpretation of results. The platform is extensible, providing a basis for further development of new functionalities for proteogenomic data visualization.

Highlights

  • Proteogenomics integrates genomics, transcriptomics and mass spectrometry (MS)‐based proteomics data to identify novel protein sequences arising from gene and transcript sequence variants

  • A proteogenomics‐based study starts with a sample which are analyzed using both generation sequencing technologies and MS‐based proteomics

  • This protein sequence database contains both proteins of known sequences contained in reference databases, as well as novel protein sequences which are derived from the transcriptome sequence via comparison to reference genome sequence

Read more

Summary

Introduction

Proteogenomics integrates genomics, transcriptomics and mass spectrometry (MS)‐based proteomics data to identify novel protein sequences arising from gene and transcript sequence variants. These have been developed to present the necessary data to MVP to enable a full exploration of proteogenomics results, including evaluation of MS/MS data supporting the identification of novel peptide sequences and visualizing peptide sequences mapped to corresponding transcript and genomic coding sequences. This mapping is required in order to view identified peptide sequences (variant or reference) against the genome and corresponding transcript sequence data derived from supporting RNA‐Seq data in proteogenomic studies.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.