Abstract

Controlled vocabularies (CVs), i.e. a collection of predefined terms describing a modeling domain, used for the semantic annotation of data, and ontologies are used in structured data formats and databases to avoid inconsistencies in annotation, to have a unique (and preferably short) accession number and to give researchers and computer algorithms the possibility for more expressive semantic annotation of data. The Human Proteome Organization (HUPO)–Proteomics Standards Initiative (PSI) makes extensive use of ontologies/CVs in their data formats. The PSI-Mass Spectrometry (MS) CV contains all the terms used in the PSI MS–related data standards. The CV contains a logical hierarchical structure to ensure ease of maintenance and the development of software that makes use of complex semantics. The CV contains terms required for a complete description of an MS analysis pipeline used in proteomics, including sample labeling, digestion enzymes, instrumentation parts and parameters, software used for identification and quantification of peptides/proteins and the parameters and scores used to determine their significance. Owing to the range of topics covered by the CV, collaborative development across several PSI working groups, including proteomics research groups, instrument manufacturers and software vendors, was necessary. In this article, we describe the overall structure of the CV, the process by which it has been developed and is maintained and the dependencies on other ontologies.Database URL: http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/psi-ms/mzML/controlledVocabulary/psi-ms.obo

Highlights

  • Proteomics is the use of gel electrophoresis and/or chromatography combined with mass spectrometry (MS)-based methods to identify and quantify proteins from complex samples, e.g. blood or urine, with the aim of increasing our understanding of the proteins, their function, interactions, expression control and other properties under normal, diseased or other conditions

  • The HUPO-PSI (Human Proteome Organization–Proteomics Standards Initiative) is a proteomics community organization defining standard formats for the data representation in proteomics to facilitate the data comparison, exchange and verification. It developed a set of XML-based standard formats, including mzML [3] for raw and processed MS data, TraML [4] for input transitions to selected reaction monitoring [5] (SRM), i.e. targeted proteomics approaches, where only exactly determined m/ z values are selected for detection by exactly specifying the precursor–product transitions, mzIdentML [6] for peptide and protein identification data and mzQuantML (Walzer et al, in preparation) for proteomics quantification results

  • We describe the PSI-MS Controlled vocabularies (CVs) as the central terminology reference used by the current proteomics standard formats defined by HUPO-PSI as well as by the upcoming mzQuantML format for MS quantification information; mzTab (Griss et al, in preparation), a tab-delimited file format used for MS identification and quantification information; PEFF (PSI Extended Fasta Format), a proposed unified format for protein and nucleotide sequence databases for Standard format

Read more

Summary

The HUPO proteomics standards initiativemass spectrometry controlled vocabulary

Citation details: Mayer,G., Montecchi-Palazzi,L., Ovelleiro,D. et al The HUPO proteomics standards initiative-mass spectrometry controlled vocabulary. Controlled vocabularies (CVs), i.e. a collection of predefined terms describing a modeling domain, used for the semantic annotation of data, and ontologies are used in structured data formats and databases to avoid inconsistencies in annotation, to have a unique (and preferably short) accession number and to give researchers and computer algorithms the possibility for more expressive semantic annotation of data. The Human Proteome Organization (HUPO)–Proteomics Standards Initiative (PSI) makes extensive use of ontologies/CVs in their data formats. The CV contains a logical hierarchical structure to ensure ease of maintenance and the development of software that makes use of complex semantics. Owing to the range of topics covered by the CV, collaborative development across several PSI working groups, including proteomics research groups, instrument manufacturers and software vendors, was necessary.

Introduction
Spectrum generation information Spectrum interpretation Standard
Target list Transition Unit
Spectrum identification result details
Some special cases
Dependencies on other ontologies
Statistical metric according to BioPortal
Future directions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.