A fast and efficient python library for interfacing with the Biological Magnetic Resonance Data Bank

Andrey Smelter,Morgan Astra,Hunter N B Moseley

doi:10.1186/s12859-017-1580-5

Abstract

BackgroundThe Biological Magnetic Resonance Data Bank (BMRB) is a public repository of Nuclear Magnetic Resonance (NMR) spectroscopic data of biological macromolecules. It is an important resource for many researchers using NMR to study structural, biophysical, and biochemical properties of biological macromolecules. It is primarily maintained and accessed in a flat file ASCII format known as NMR-STAR. While the format is human readable, the size of most BMRB entries makes computer readability and explicit representation a practical requirement for almost any rigorous systematic analysis.ResultsTo aid in the use of this public resource, we have developed a package called nmrstarlib in the popular open-source programming language Python. The nmrstarlib’s implementation is very efficient, both in design and execution. The library has facilities for reading and writing both NMR-STAR version 2.1 and 3.1 formatted files, parsing them into usable Python dictionary- and list-based data structures, making access and manipulation of the experimental data very natural within Python programs (i.e. “saveframe” and “loop” records represented as individual Python dictionary data structures). Another major advantage of this design is that data stored in original NMR-STAR can be easily converted into its equivalent JavaScript Object Notation (JSON) format, a lightweight data interchange format, facilitating data access and manipulation using Python and any other programming language that implements a JSON parser/generator (i.e., all popular programming languages). We have also developed tools to visualize assigned chemical shift values and to convert between NMR-STAR and JSONized NMR-STAR formatted files. Full API Reference Documentation, User Guide and Tutorial with code examples are also available.We have tested this new library on all current BMRB entries: 100% of all entries are parsed without any errors for both NMR-STAR version 2.1 and version 3.1 formatted files. We also compared our software to three currently available Python libraries for parsing NMR-STAR formatted files: PyStarLib, NMRPyStar, and PyNMRSTAR.ConclusionsThe nmrstarlib package is a simple, fast, and efficient library for accessing data from the BMRB. The library provides an intuitive dictionary-based interface with which Python programs can read, edit, and write NMR-STAR formatted files and their equivalent JSONized NMR-STAR files. The nmrstarlib package can be used as a library for accessing and manipulating data stored in NMR-STAR files and as a command-line tool to convert from NMR-STAR file format into its equivalent JSON file format and vice versa, and to visualize chemical shift values. Furthermore, the nmrstarlib implementation provides a guide for effectively JSONizing other older scientific formats, improving the FAIRness of data in these formats.

Highlights

The Biological Magnetic Resonance Data Bank (BMRB) is a public repository of Nuclear Magnetic Resonance (NMR) spectroscopic data of biological macromolecules
The nmrstarlib package can be used in two ways: 1) as a library for accessing and manipulating data stored in NMR-Self-defining Text Archival and Retrieving (STAR) formatted files, converting between NMR-STAR and its equivalent JavaScript Object Notation (JSON) format, and visualizing assigned chemical shift values; or 2) as a standalone command-line tool for converting files in bulk and visualizing assigned chemical shift values
We found that nmrstarlib’s average reading speed is 1,700 kilobytes per second (KB/sec) (NMR-STAR 3.1) and 3,290 KB/sec (NMR-STAR 2.1) for the Python implementation and 4,421 KB/sec (NMRSTAR 3.1) and 3,351 KB/sec (NMR-STAR 2.1) for the Cython implementation on the hardware used for testing

Summary

Introduction

The Biological Magnetic Resonance Data Bank (BMRB) is a public repository of Nuclear Magnetic Resonance (NMR) spectroscopic data of biological macromolecules. The Biological Magnetic Resonance Data Bank (BMRB) is a free, publicly-accessible repository of data on peptides, proteins, and nucleic acids obtained through NMR Spectroscopy [1], that is part of the worldwide Protein Databank (wwPDB) [2]. It currently consists of more than 11,000 individual NMR-STAR file entries, containing a wide range of NMR spectral data, experimental details, and biochemical data collected from thousands of biological samples. While they both use the same NMR-STAR format at the most general level, the layout of the data in the two formats is different

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Mar 17, 2017
Citations: 8	License type: open-access

R Discovery Prime

R Discovery Prime

A fast and efficient python library for interfacing with the Biological Magnetic Resonance Data Bank

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Nightshift: A Python program for plotting simulated NMR spectra from assigned chemical shifts from the Biological Magnetic Resonance Data Bank.
Ian J Fucci ... R Andrew Byrd
Protein science : a publication of the Protein Society | VOL. 31
Ian J Fucci, et. al.Ian J Fucci ... R Andrew Byrd
22 Sep 2021
Protein science : a publication of the Protein Society | VOL. 31

Python- An Appetite for the Software Industry
Saphalya Peta
International Journal of Programming Languages and Applications | VOL. 12
Saphalya PetaSaphalya Peta
31 Oct 2022
International Journal of Programming Languages and Applications | VOL. 12

Identification Author of Source Code by Machine Learning Methods
Alexander Romanov ... Anna Kurtukova
Труды СПИИРАН | VOL. 18
Alexander Romanov, et. al.Alexander Romanov ... Anna Kurtukova
04 Jun 2019
Труды СПИИРАН | VOL. 18

Latest Trends in Twitter from Arab Countries and the World
Wafa Waheeda Syed ... Abdelkader Lattab
-
Wafa Waheeda Syed, et. al.Wafa Waheeda Syed ... Abdelkader Lattab
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A fast and efficient python library for interfacing with the Biological Magnetic Resonance Data Bank

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics