Abstract

With over 150,000 entries, the worldwide protein data bank (PDB) is the primary repository for 3D macromolecular structure. Unfortunately, structural, annotational, and ambiguity errors exist for carbohydrates throughout the database due in part to the lack of carbohydrate‐specific tools for checking the quality of structures prior to deposition. Our group has partnered with the PDB Biocuration team to assist in the identification and remediation of carbohydrates in their database and we have developed a user‐friendly web interface called GlyFinder to accurately find, retrieve, and assess these glycan and glycoproteins.Using GlyFinder, we have found that 45,852 of the PDB entries contain carbohydrates (30.1% of the PDB). Nearly 6,000 glycoproteins have been identified, with an average of five N‐linked glycans per glycoprotein. Only 415 glycoproteins contained O‐linked glycans, with an average of three O‐linked glycans each. A surprisingly high number of glycoprotein PDBs (500, 7.45%) contain one or more N‐linked glycans that are alpha‐linked to the asparagine, illustrating the unfortunate errors that sometimes exist in the data.Because of such errors, and because the glycans in crystal structures are often truncated, we have also developed tools (Glycoprotein Builder) that can build realistic models of the glycoprotein with intact glycans employing the crystal structure of the protein core. These models allow us to predict the impact of glycosylation on protein function, antigenicity, immunogenicity and stability. We illustrate these capabilities for several proteins, including human Erythropoietin, HIV gp120, and influenza A hemagglutinin.Support or Funding InformationSupport from NIH U01CA221216, U01GM125267, and P41GM103390

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call