Abstract
SummaryStructural biology relies on specific file formats to convey information about macromolecular structures. Traditionally this has been the PDB format, but increasingly newer formats, such as PDBML, mmCIF and MMTF are being used. Here we present atomium, a modern, lightweight, Python library for parsing, manipulating and saving PDB, mmCIF and MMTF file formats. In addition, we provide a web service, pdb2json, which uses atomium to give a consistent JSON representation to the entire Protein Data Bank.Availability and implementationatomium is implemented in Python and its performance is equivalent to the existing library BioPython. However, it has significant advantages in features and API design. atomium is available from atomium.bioinf.org.uk and pdb2json can be accessed at pdb2json.bioinf.org.ukSupplementary information Supplementary data are available at Bioinformatics online.
Highlights
Structural biology is the study of biological macromolecules at the molecular level, the arrangement of their atoms in space, and how this atomic structure dictates their functions
For any computational analysis of these structures, a representation of them must be stored on disk, and from the early days of structural biology, the PDB (Protein Data Bank) file format was used to represent these structures (Bernstein et al, 1977)
This may not be the biologically relevant structure, so these files contain biological assembly instructions: transformation matrices which are applied to the polymers in the structure to recreate the biologically relevant structure. atomium can generate new models from the asymmetric unit coordinates using a single function
Summary
Structural biology is the study of biological macromolecules at the molecular level, the arrangement of their atoms in space, and how this atomic structure dictates their functions. For any computational analysis of these structures, a representation of them must be stored on disk, and from the early days of structural biology, the PDB (Protein Data Bank) file format was used to represent these structures (Bernstein et al, 1977) This format uses 80-character lines, with fields defined by position along that line, to represent information about the atoms in a structure. There are various examples for different languages, such as BioJava for Java (Lafita et al, 2019) and BiopLib for C (Porter and Martin, 2015) These libraries provide the user with a standard interface for analysing very diverse structures, by representing them in terms of a small number of object types, such as atoms, chains and residues, and provide a layer of abstraction that makes more complex tasks such as creating scoring functions more straightforward.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.