Abstract
BackgroundBiomacromolecular structural data outgrew the legacy Protein Data Bank (PDB) format which the scientific community relied on for decades, yet the use of its successor PDBx/Macromolecular Crystallographic Information File format (PDBx/mmCIF) is still not widespread. Perhaps one of the reasons is the availability of easy to use tools that only support the legacy format, but also the inherent difficulties of processing mmCIF files correctly, given the number of edge cases that make efficient parsing problematic. Nevertheless, to fully exploit macromolecular structure data and their associated annotations such as multiscale structures from integrative/hybrid methods or large macromolecular complexes determined using traditional methods, it is necessary to fully adopt the new format as soon as possible.ResultsTo this end, we developed PDBeCIF, an open-source Python project for manipulating mmCIF and CIF files. It is part of the official list of mmCIF parsers recorded by the wwPDB and is heavily employed in the processes of the Protein Data Bank in Europe. The package is freely available both from the PyPI repository (http://pypi.org/project/pdbecif) and from GitHub (https://github.com/pdbeurope/pdbecif) along with rich documentation and many ready-to-use examples.ConclusionsPDBeCIF is an efficient and lightweight Python 2.6+/3+ package with no external dependencies. It can be readily integrated with 3rd party libraries as well as adopted for broad scientific analyses.
Highlights
Biomacromolecular structural data outgrew the legacy Protein Data Bank (PDB) format which the scientific community relied on for decades, yet the use of its successor PDBx/Macromolecular Crystallographic Information File format (PDBx/ mmCIF) is still not widespread
Since the Crystallographic Information Files (CIF) format is derived from the syntax of the Self-defining Text Archive and Retrieval (STAR) [10] format, PDBeCIF relies on a community established solution for tokenization to aid file interpretation
We present a lightweight, general-purpose Python package, PDBeCIF
Summary
One of the advantages of the PDBx/mmCIF file format is the inclusion of additional information alongside the coordinates making the data compliant with the FAIR principles and providing a more complete biological context. In many cases, this information is fragmented and can only be obtained by a combination of different specialist resources e.g. PDBe makes updated PDBx/ mmCIF files available that feature additional information These files promote consistent and standardized metadata on the top of the core PDB archive information, facilitating further expansion of the core Exchange Dictionary.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.