Psims - A Declarative Writer for mzML and mzIdentML for Python

Joshua Klein,Joseph Zaia

doi:10.1074/mcp.rp118.001070

Joshua Klein, Joseph Zaia

Open Access

PDF Available

https://doi.org/10.1074/mcp.rp118.001070

Copy DOI

Export

Save

Cite

Journal: Molecular & Cellular Proteomics	Publication Date: Mar 1, 2019
Citations: 13	License type: cc-by

Affiliation: Boston University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

mzML and mzIdentML are commonly used, powerful tools for representing mass spectrometry data and derived identification information. These formats are complex, requiring non-trivial logic to translate data into the appropriate representation. Most published implementations are tightly coupled to data structures. The most complete implementations are written in compiled languages that cannot expose the complete flexibility of the implementation to external programs or bindings. To our knowledge, there are no complete implementations for mzML or mzIdentML available to scripting languages like Python or R. We present psims, a library written in Python for writing mzML and mzIdentML. The library allows writing either XML format using built-in Python data structures. It includes a controlled vocabulary resolution system to simplify the encoding process and an identity tracking system to manage entity relationships. The source code is available at https://github.com/mobiusklein/psims, and through the Python Package Index as psims, licensed under the Apache 2 common license.

Highlights

The proliferation of data processing and identification methods in mass spectrometry has led to ever increasing complexity for tools that need to describe their results
The most complete implementations are written in compiled languages that cannot expose the complete flexibility of the implementation to external programs or bindings
The source code is available at https:// github.com/mobiusklein/psims, and through the Python Package Index as psims, licensed under the Apache 2 common license

Summary

Introduction

The proliferation of data processing and identification methods in mass spectrometry has led to ever increasing complexity for tools that need to describe their results. Over the last decade and a half, the community-driven XML standards for representing spectral data, mzML [1], and peptide/protein identification, mzIdentML [2], have become core to computational methods development [3]. These formats combine a complex XML schema for defining the structure of the information contained with a flexible vocabulary of terms for describing the contents [4]

Methods

Results

Conclusion