PROSITE Patterns Research Articles

Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The family of DNA-binding proteins is one of the most populated and studied amongst the various genomes of bacteria, archaea and eukaryotes and the Web-based system presented here is an approach to their classification. The DnaProt resource is an annotated and searchable collection of protein sequences for the families of DNA-binding proteins. The database contains 3238 full-length sequences (retrieved from the SWISS-PROT database, release 38) that include, at least, a DNA-binding domain. Sequence entries are organized into families defined by PROSITE patterns, PRINTS motifs and de novo excised signatures. Combining global similarities and functional motifs into a single classification scheme, DNA-binding proteins are classified into 33 unique classes, which helps to reveal comprehensive family relationships. To maximize family information retrieval, DnaProt contains a collection of multiple alignments for each DNA-binding family while the recognized motifs can be used as diagnostically functional fingerprints. All available structural class representatives have been referenced. The resource was developed as a Web-based management system for online free access of customized data sets. Entries are fully hyperlinked to facilitate easy retrieval of the original records from the source databases while functional and phylogenetic annotation will be applied to newly sequenced genomes. The database is freely available for online search of a library containing specific patterns of the identified DNA-binding protein classes and retrieval of individual entries from our WWW server (http://kronos.biol.uoa.gr/~mariak/dbDNA.html).

Read full abstract

An amino acid sequence pattern conserved among a family of proteins is called motif. It is usually related to the specific function of the family. On the other hand, functions of proteins are realized through their 3D structures. Specific local structures, called structural motifs, are considered as related to their functions. However, searching for common structural motifs in different proteins is much more difficult than for common sequence motifs. We are attempting in this study to convert the information about the structural motifs into a set of one-dimensional digital strings, i.e., a set of codes, to compare them more easily by computer and to investigate their relationship to functions more quantitatively. By applying the Delaunay tessellation to a 3D structure of a protein, we can assign each local structure to a unique code that is defined so as to reflect its structural feature. Since a structural motif is defined as a set of the local structures in this paper, the structural motif is represented by a set of the codes. In order to examine the ability of the set of the codes to distinguish differences among the sets of local structures with a given PROSITE pattern that contain both true and false positives, we clustered them by introducing a similarity measure among the set of the codes. The obtained clustering shows a good agreement with other results by direct structural comparison methods such as a superposition method. The structural motifs in homologous proteins are also properly clustered according to their sources. These results suggest that the structural motifs can be well characterized by these sets of the codes, and that the method can be utilized in comparing structural motifs and relating them with function.

Read full abstract

PROSITE Patterns Research Articles

Articles published on PROSITE Patterns

Development of computational tools for the inference of protein interaction specificity rules and functional annotation using structural information

3MATRIX and 3MOTIF: a protein structure visualization system for conserved sequence motifs.

Role of context in the relationship between form and function: structural plasticity of some PROSITE patterns

Regular biosequence pattern matching with cellular automata

Pattern searches for the identification of putative lipoprotein genes in Gram-positive bacterial genomes.

Singular value decomposition analysis of protein sequence alignment score data.

A Web-based classification system of DNA-binding protein families.

1P030PROSITEパターン近傍の局所構造の解析

Analysis of protein structural motifs in terms of sets of codes representing local structures

PDBsum: summaries and analyses of PDB structures

HUNT: launch of a full-length cDNA database from the Helix Research Institute.

Searching the Protein Structure Databank with Weak Sequence Patterns and Structural Constraints

Systematic and fully automated identification of protein sequence patterns.

Conformational analysis of long spacers in PROSITE patterns

ProClass protein family database.

Three-dimensional structure analysis of PROSITE patterns

ProClass Protein Family Database.

Variations of the C2H2 zinc finger motif in the yeast genome and classification of yeast zinc finger proteins.

A protein class database organized with ProSite protein groups and PIR superfamilies.

Method for calculation of probability of matching a bounded regular expression in a random data string.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

PROSITE Patterns Research Articles

Articles published on PROSITE Patterns

Development of computational tools for the inference of protein interaction specificity rules and functional annotation using structural information

3MATRIX and 3MOTIF: a protein structure visualization system for conserved sequence motifs.

Role of context in the relationship between form and function: structural plasticity of some PROSITE patterns

Regular biosequence pattern matching with cellular automata

Pattern searches for the identification of putative lipoprotein genes in Gram-positive bacterial genomes.

Singular value decomposition analysis of protein sequence alignment score data.

A Web-based classification system of DNA-binding protein families.

1P030PROSITEパターン近傍の局所構造の解析

Analysis of protein structural motifs in terms of sets of codes representing local structures

PDBsum: summaries and analyses of PDB structures

HUNT: launch of a full-length cDNA database from the Helix Research Institute.

Searching the Protein Structure Databank with Weak Sequence Patterns and Structural Constraints

Systematic and fully automated identification of protein sequence patterns.

Conformational analysis of long spacers in PROSITE patterns

ProClass protein family database.

Three-dimensional structure analysis of PROSITE patterns

ProClass Protein Family Database.

Variations of the C2H2 zinc finger motif in the yeast genome and classification of yeast zinc finger proteins.

A protein class database organized with ProSite protein groups and PIR superfamilies.

Method for calculation of probability of matching a bounded regular expression in a random data string.