Abstract

mzML and mzIdentML are commonly used, powerful tools for representing mass spectrometry data and derived identification information. These formats are complex, requiring non-trivial logic to translate data into the appropriate representation. Most published implementations are tightly coupled to data structures. The most complete implementations are written in compiled languages that cannot expose the complete flexibility of the implementation to external programs or bindings. To our knowledge, there are no complete implementations for mzML or mzIdentML available to scripting languages like Python or R. We present psims, a library written in Python for writing mzML and mzIdentML. The library allows writing either XML format using built-in Python data structures. It includes a controlled vocabulary resolution system to simplify the encoding process and an identity tracking system to manage entity relationships. The source code is available at https://github.com/mobiusklein/psims, and through the Python Package Index as psims, licensed under the Apache 2 common license.

Highlights

  • The proliferation of data processing and identification methods in mass spectrometry has led to ever increasing complexity for tools that need to describe their results

  • The most complete implementations are written in compiled languages that cannot expose the complete flexibility of the implementation to external programs or bindings

  • The source code is available at https:// github.com/mobiusklein/psims, and through the Python Package Index as psims, licensed under the Apache 2 common license

Read more

Summary

Introduction

The proliferation of data processing and identification methods in mass spectrometry has led to ever increasing complexity for tools that need to describe their results. Over the last decade and a half, the community-driven XML standards for representing spectral data, mzML [1], and peptide/protein identification, mzIdentML [2], have become core to computational methods development [3]. These formats combine a complex XML schema for defining the structure of the information contained with a flexible vocabulary of terms for describing the contents [4]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.