The Protein Data Bank (PDB) was founded in 1971 as the first open-access digital data resource in biology to serve as the single global archive for three-dimensional (3D) macromolecular structure data. Current PDB holdings exceed 230,000 experimentally determined structures of proteins, nucleic acids, viruses, and macromolecular machines. The RCSB Protein Data Bank RCSB.org research-focused web portal facilitates search, analyses, and visualization of every PDB structure along with more than one million Computed Structure Models from AlphaFold DB and the ModelArchive. It is powered by a set of publicly available Application Programming Interfaces (APIs) that both support RCSB.org users and provide programmatic access to PDB data. Given the breadth and levels of granularity encompassed in this rich data collection, efficiently accessing the information programmatically may be challenging for new users. RCSB PDB has developed a Python software package, rcsb-api, that facilitates easy and efficient use of RCSB PDB APIs within a Python environment. This software tool is designed to streamline access to the extensive corpus of data housed within the PDB, enabling researchers to search, retrieve, and analyze 3D biostructure data seamlessly. Its use will accelerate research in structural biology, molecular biology and biochemistry, drug discovery, and bioinformatics by providing more efficient tools for data integration and analysis. The new toolkit is available on GitHub (github.com/rcsb/py-rcsb-api) and published to the public Python package repository (PyPI) to foster wider usage and support basic and applied research in fundamental biology, biomedicine, and the energy sciences.
Read full abstract