Abstract
The HUPO Proteomics Standards initiative has dedicated the last 6 years to the design and implementation of common data reporting and exchange standards enabling the transfer of proteomics data from originator to collaborator to a final public repository immediately prior to publication. The user community is now benefiting from this work, with XML formats to exchange and import data into databases, allowing direct access and comparability irrespective of the originating instrumentation. Public repositories now allow researchers to view and search published experimental data and reference datasets are becoming available for benchmarking purposes. Collaborations between databases are exposing these datasets to an ever increasing audience and enabling exciting new science to be derived from existing data.
Highlights
The nascent field of proteomics was in a state of relative chaos, with ever growing amounts of data being generated from increasingly high throughput machines but no downstream repositories in place to capture this information
The deposition of raw data into public domain repositories will have allowed such exercises to be undertaken and allow comparison of datasets generated by different research groups – for example protein sets from diseased tissue against reference sets generated in normal, healthy tissue or the potential contaminating ef-. For each of these areas Minimum Information About a Proteomics Experiment (MIAPE) documents have been developed, analogous to the MIAME (Minimum Information About a Microarray Experiment) guidelines (Brazma et al, 2001) for DNA microarray experiments, to define those data items that should minimally be reported about a proteomics experiment to allow critical assessment of the experiment
The mass spectrometry workgroup published the mzData XML format in 2004, which allowed the storage of proteomic-related mass spectral data, ranging from basic details about the sample, instrument details and data processing steps, through to the actual spectral lists of mass-to-charge values and intensities, using base64 encoding to represent the floating point mass-to-charge (m/z) and ion intensity (Orchard et al, 2005)
Summary
The nascent field of proteomics was in a state of relative chaos, with ever growing amounts of data being generated from increasingly high throughput machines but no downstream repositories in place to capture this information. The only option available to the researcher was to publish the cream of the results in a journal article, with protein lists largely available only in Supplemental Information and no access for the reader to the raw data from which these lists were generated. Any data not included in this article was lost, and researchers were unable to track individual proteins across these datasets to obtain a picture of their expression profile. To add to this confusion, raw data generated by different mass spectrometers and by each peptide search engine was available only in the manufacturer’s proprietary format and comparison of such data was impossible, even when generated within the same laboratory but by using diverse machines. Fects of the known plasma proteome accounted for when examining the proteome of tissues such as the heart or liver
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.