Abstract
Recent improvements in computational and experimental techniques for obtaining protein structures have resulted in an explosion of 3D coordinate data. To cope with the ever-increasing sizes of structure databases, this work proposes the Protein Data Compression (PDC) format, which compresses coordinates and temperature factors of full-atomic and Cα-only protein structures. Without loss of precision, PDC results in 69% to 78% smaller file sizes than Protein Data Bank (PDB) and macromolecular Crystallographic Information File (mmCIF) files with standard GZIP compression. It uses ∼60% less space than existing compression algorithms specific to macromolecular structures. PDC optionally performs lossy compression with minimal sacrifice of precision, which allows reduction of file sizes by another 79%. Conversion between PDC, mmCIF and PDB formats is typically achieved within 0.02 s. The compactness and fast reading/writing speed of PDC make it valuable for storage and analysis of large quantity of tertiary structural data. Database URL https://github.com/kad-ecoli/pdc.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Database : the journal of biological databases and curation
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.