Abstract
BackgroundProtein O-GlcNAcylation (or O-GlcNAc-ylation) is an O-linked glycosylation involving the transfer of β-N-acetylglucosamine to the hydroxyl group of serine or threonine residues of proteins. Growing evidences suggest that protein O-GlcNAcylation is common and is analogous to phosphorylation in modulating broad ranges of biological processes. However, compared to phosphorylation, the amount of protein O-GlcNAcylation data is relatively limited and its annotation in databases is scarce. Furthermore, a bioinformatics resource for O-GlcNAcylation is lacking, and an O-GlcNAcylation site prediction tool is much needed.DescriptionWe developed a database of O-GlcNAcylated proteins and sites, dbOGAP, primarily based on literature published since O-GlcNAcylation was first described in 1984. The database currently contains ~800 proteins with experimental O-GlcNAcylation information, of which ~61% are of humans, and 172 proteins have a total of ~400 O-GlcNAcylation sites identified. The O-GlcNAcylated proteins are primarily nucleocytoplasmic, including membrane- and non-membrane bounded organelle-associated proteins. The known O-GlcNAcylated proteins exert a broad range of functions including transcriptional regulation, macromolecular complex assembly, intracellular transport, translation, and regulation of cell growth or death. The database also contains ~365 potential O-GlcNAcylated proteins inferred from known O-GlcNAcylated orthologs. Additional annotations, including other protein posttranslational modifications, biological pathways and disease information are integrated into the database. We developed an O-GlcNAcylation site prediction system, OGlcNAcScan, based on Support Vector Machine and trained using protein sequences with known O-GlcNAcylation sites from dbOGAP. The site prediction system achieved an area under ROC curve of 74.3% in five-fold cross-validation. The dbOGAP website was developed to allow for performing search and query on O-GlcNAcylated proteins and associated literature, as well as for browsing by gene names, organisms or pathways, and downloading of the database. Also available from the website, the OGlcNAcScan tool presents a list of predicted O-GlcNAcylation sites for given protein sequences.ConclusionsdbOGAP is the first public bioinformatics resource to allow systematic access to the O-GlcNAcylated proteins, and related functional information and bibliography, as well as to an O-GlcNAcylation site prediction tool. The resource will facilitate research on O-GlcNAcylation and its proteomic identification.
Highlights
Protein O-GlcNAcylation is an O-linked glycosylation involving the transfer of bN-acetylglucosamine to the hydroxyl group of serine or threonine residues of proteins
O-GlcNAcylation, or O-GlcNAc-ylation to distinguish it from acylation, is an O-linked glycosylation involving the b-attachment of a single N-acetylglucosamine (GlcNAc) to the serine (Ser)/threonine (Thr) residues catalyzed by O-GlcNAc transferase (OGT), whose removal is catalyzed by O-GlcNAcase (OGA) [1]
We report the development of a database of OGlcNAcylated proteins and sites for all currently known O-GlcNAcylated proteins reported from literature, and of an O-GlcNAcylation site prediction system (OGlcNAcScan) based on nearly 400 O-GlcNAcylation sites
Summary
O-GlcNAcylation, or O-GlcNAc-ylation to distinguish it from acylation, is an O-linked glycosylation involving the b-attachment of a single N-acetylglucosamine (GlcNAc) to the serine (Ser)/threonine (Thr) residues catalyzed by O-GlcNAc transferase (OGT), whose removal is catalyzed by O-GlcNAcase (OGA) [1]. We report the development of a database of OGlcNAcylated proteins and sites (dbOGAP) for all currently known O-GlcNAcylated proteins reported from literature, and of an O-GlcNAcylation site prediction system (OGlcNAcScan) based on nearly 400 O-GlcNAcylation sites. Additional protein annotations, including other protein modifications (e.g., phosphorylation) and site features, Gene Ontology, pathways and disease information were integrated into dbOGAP from UniProtKB [18] or iProClass [27] databases. 1. The Database Contents The current version of dbOGAP contains 1163 entries, including 798 experimentally determined O-GlcNAcylated proteins (Figure 2, Left, A), and 365 proteins with inferred O-GlcNAcylation sites (total 735 sites) based on OGlcNAcylated orthologs (Figure 2, Left, D). The website provides functionalities depicted by #1-#4: 1) search and browse the O-GlcNAcylated proteins in the database; 2) de novo prediction of O-GlcNAcylation sites for any protein sequences; 3) user annotation of O-GlcNAcylation information; 4) search and browse the total O-GlcNAcylation bibliography. The data sets for developing the OGlcNAcScan system are available to the scientific community for further development of the site prediction (Figure 4, #2)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.