VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data

Elena S Peterson,Jeffrey L Jensen,Joshua N Adkins,Hyunjoo Walker,Charles Ansong,Samuel H Payne,Lee Ann Mccue,Markus A Kobold,Samantha R Webb,William R Cannon,Alexandra C Schrimpe-Rutledge,Bobbie-Jo M Webb-Robertson

doi:10.1186/1471-2164-13-131

Elena S Peterson, Jeffrey L Jensen + Show 10 more

Open Access

https://doi.org/10.1186/1471-2164-13-131

Copy DOI

Journal: BMC Genomics	Publication Date: Apr 5, 2012
Citations: 60	License type: cc-by

Affiliation: Pacific Northwest National Laboratory

Abstract

BackgroundThe procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates.ResultsVESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data.ConclusionsVESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php.

Highlights

The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge
Data import, processing and summarization Visual Exploration and Statistics to Promote Annotation (VESPA) works under the concept of a project which, when created, at minimum requires the genomic sequence of a chromosome or plasmid and the defined gene features (ORFs and RNA genes in general feature format (GFF) format)
For organisms with multiple genetic elements, a unique project can be created and saved for each element using the same proteomics file and transcriptomic files tailored to each DNA file

Summary

Introduction

The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, integrating these data types to validate and improve structural annotation remains a major challenge. One challenge often not addressed in the context of HTP technologies is the relationship of the (RNA-Seq) and tandem mass spectrometry (MS/MS)based proteomics have demonstrated immense value to genome curators [3,4,5,6,7,8,9] to locate features such as missed genes and intron/exon borders. While the procedural aspects of genome sequencing and assembly have become relatively inexpensive, the full and accurate annotation of these genomes, and integration of HTP data types to improve structural genome annotation is not straightforward, still very labor-intensive, and few computational tools have been developed to address this issue. Visualization and analysis of these data in genomic context, in order to enhance the annotations or make inferences about mis-annotations, remains a challenge

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

New Data Analysis and Mining Approaches Identify Unique Proteome and Transcriptome Markers of Susceptibility to Autoimmune Diabetes
Ivan C Gerling ... Jian Wu
Molecular & Cellular Proteomics | VOL. 5
Ivan C Gerling, et. al.Ivan C Gerling ... Jian Wu
16 Oct 2005
Molecular & Cellular Proteomics | VOL. 5

Identification of integrated proteomics and transcriptomics signature of alcohol-associated liver disease using machine learning.
Stanislav Listopad ... Andrew Stolz
PLOS digital health | VOL. 3
Stanislav Listopad, et. al.Stanislav Listopad ... Andrew Stolz
09 Feb 2024
PLOS digital health | VOL. 3

CIPRO 2.5: Ciona intestinalis Protein Database - a unique integrated repository of large-scale omics data, bioinformatic analyses, and curated annotation, with ability for user rating and comments
Toshinori Endo ...
Genome Biology | VOL. 11
Toshinori Endo, et. al.Toshinori Endo ...
11 Oct 2010
Genome Biology | VOL. 11

Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage.
Patrick Willems ... Igor Fijalkowski
mSystems | VOL. 5
Patrick Willems, et. al.Patrick Willems ... Igor Fijalkowski
27 Oct 2020
mSystems | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics