Recent interest in repeat proteins has arisen due to stable structural folds, high evolutionary conservation and repertoire of functions provided by these proteins. However, repeat proteins are poorly characterized because of high sequence variation between repeating units and structure-based identification and classification of repeats is desirable. Using a robust network-based pipeline, manual curation and Kajava's structure-based classification schema, we have developed a database of tandem structural repeats, Database of Structural Repeats in Proteins (DbStRiPs). A unique feature of this database is that available knowledge on sequence repeat families is incorporated by mapping Pfam classification scheme onto structural classification. Integration of sequence and structure-based classifications help in identifying different functional groups within the same structural subclass, leading to refinement in the annotation of repeat proteins. Analysis of complete Protein Data Bank revealed 16,472 repeat annotations in 15,141 protein chains, one previously uncharacterized novel protein repeat family (PRF), named left-handed beta helix, and 33 protein repeat clusters (PRCs). Based on their unique structural motif, ~79% of these repeat proteins are classified in one of the 14 PRFs or 33 PRCs, and the remaining are grouped as unclassified repeat proteins. Each repeat protein is provided with a detailed annotation in DbStRiPs that includes start and end boundaries of repeating units, copy number, secondary and tertiary structure view, repeat class/subclass, disease association, MSA of repeating units and cross-references to various protein pattern databases, human protein atlas and interaction resources. DbStRiPs provides easy search and download options to high-quality annotations of structural repeat proteins (URL: http://bioinf.iiit.ac.in/dbstrips/).
Read full abstract