Pairwise Sequence Identity Research Articles

Amino acid sequence analysis corresponding to the PPE proteins in H37Rv and CDC1551 strains of the Mycobacterium tuberculosis genomes resulted in the identification of a previously uncharacterized 225 amino acid-residue common region in 22 proteins. The pairwise sequence identities were as low as 18%. Conservation of amino acid residues was observed at fifteen positions that were distributed over the whole length of the region. The secondary structure corresponding to this region is predicted to be a mixture of a-helices and b-strands. Although the function is not known, proteins with this region specific to mycobacterial species may be associated with a common function. We further observed another group of 20 PPE proteins corresponding to the conserved C-terminal region comprising 44 amino acid residues with GFxGT and PxxPxxW sequence motifs. This region is preceded by a hydrophobic region, comprising 40-100 amino acid residues, that is flanked by charged amino acid residues. Identification of conserved regions described above may be useful to detect related proteins from other genomes and assist the design of suitable experiments to test their corresponding functions. Amino acid sequence analysis corresponding to the PE proteins resulted in the identification of tandem repeats comprising 41-43 amino acid residues in the C-terminal variable regions in two PE proteins (Rv0978 and Rv0980). These correspond to the AB repeats that were first identified in some proteins of the Methanosarcina mazei genome, and were demonstrated as surface antigens. We observed the AB repeats also in several other proteins of hitherto uncharacterized function in Archaea and Bacteria genomes. Some of these proteins are also associated with another repeat called the C-repeat or the PKD-domain comprising 85 amino acid residues. The secondary structure corresponding to the AB repeat is predicted mainly as 4 b-strands. We suggest that proteins with AB repeats in Mycobacterium tuberculosis and other genomes may be associated as surface antigens. The M. leprae genome, however, does not contain either the AB or C-repeats and different proteins may therefore be recruited as surface antigens in the M. leprae genome compared to the M. tuberculosis genome.

Nitrogen regulatory (PII) proteins are signal transduction molecules involved in controlling nitrogen metabolism in prokaryots. PII proteins integrate the signals of intracellular nitrogen and carbon status into the control of enzymes involved in nitrogen assimilation. Using elaborate sequence similarity detection schemes, we show that five clusters of orthologs (COGs) and several small divergent protein groups belong to the PII superfamily and predict their structure to be a (betaalphabeta)(2) ferredoxin-like fold. Proteins from the newly emerged PII superfamily are present in all major phylogenetic lineages. The PII homologs are quite diverse, with below random (as low as 1%) pairwise sequence identities between some members of distant groups. Despite this sequence diversity, evidence suggests that the different subfamilies retain the PII trimeric structure important for ligand-binding site formation and maintain a conservation of conservations at residue positions important for PII function. Because most of the orthologous groups within the PII superfamily are composed entirely of hypothetical proteins, our remote homology-based structure prediction provides the only information about them. Analogous to structural genomics efforts, such prediction gives clues to the biological roles of these proteins and allows us to hypothesize about locations of functional sites on model structures or rationalize about available experimental information. For instance, conserved residues in one of the families map in close proximity to each other on PII structure, allowing for a possible metal-binding site in the proteins coded by the locus known to affect sensitivity to divalent metal ions. Presented analysis pushes the limits of sequence similarity searches and exemplifies one of the extreme cases of reliable sequence-based structure prediction. In conjunction with structural genomics efforts to shed light on protein function, our strategies make it possible to detect homology between highly diverse sequences and are aimed at understanding the most remote evolutionary connections in the protein world.

Pairwise Sequence Identity Research Articles

Articles published on Pairwise Sequence Identity

How Well is Enzyme Function Conserved as a Function of Pairwise Sequence Identity?

Inverse sequence similarity of proteins does not imply structural similarity

Sequence analysis corresponding to the PPE and PE proteins in Mycobacterium tuberculosis and other genomes.

Characterization of DNAbeta associated with begomoviruses in China and evidence for co-evolution with their cognate viral DNA-A.

Gene islands integrated into tRNA(Gly) genes confer genome diversity on a Pseudomonas aeruginosa clone.

Sequence conserved for subcellular localization.

Sequence and Structural Differences between Enzyme and Nonenzyme Homologs

Identification and phylogenetic analysis of a glucose transporter gene family from the human pathogenic yeast Candida albicans.

X‐ray structure of Saccharomyces cerevisiae homologous mitochondrial matrix factor 1 (Hmf1)

Expanding the nitrogen regulatory protein superfamily: Homology detection at below random sequence identity.

Enzyme Function Less Conserved than Anticipated

Highly divergent dihydrofolate reductases conserve complex folding mechanisms

Hybridization cross-reactivity within homologous gene families on glass cDNA microarrays.

Picasso: generating a covering set of protein family profiles.

An integrated approach to the analysis and modeling of protein sequences and structures. II. On the relationship between sequence and structural similarity for proteins that are not obviously related in sequence

Comparison of Folding Rates of Homologous Prokaryotic and Eukaryotic Proteins

"Topohydrophobic positions" as key markers of globular protein folds

Twilight zone of protein sequence alignments.

Analysis of two incompletely spliced Arabidopsis cDNAs encoding novel types of peroxidase

Linkers of secondary structures in proteins.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Pairwise Sequence Identity Research Articles

Articles published on Pairwise Sequence Identity

How Well is Enzyme Function Conserved as a Function of Pairwise Sequence Identity?

Inverse sequence similarity of proteins does not imply structural similarity

Sequence analysis corresponding to the PPE and PE proteins in Mycobacterium tuberculosis and other genomes.

Characterization of DNAbeta associated with begomoviruses in China and evidence for co-evolution with their cognate viral DNA-A.

Gene islands integrated into tRNA(Gly) genes confer genome diversity on a Pseudomonas aeruginosa clone.

Sequence conserved for subcellular localization.

Sequence and Structural Differences between Enzyme and Nonenzyme Homologs

Identification and phylogenetic analysis of a glucose transporter gene family from the human pathogenic yeast Candida albicans.

X‐ray structure of Saccharomyces cerevisiae homologous mitochondrial matrix factor 1 (Hmf1)

Expanding the nitrogen regulatory protein superfamily: Homology detection at below random sequence identity.

Enzyme Function Less Conserved than Anticipated

Highly divergent dihydrofolate reductases conserve complex folding mechanisms

Hybridization cross-reactivity within homologous gene families on glass cDNA microarrays.

Picasso: generating a covering set of protein family profiles.

An integrated approach to the analysis and modeling of protein sequences and structures. II. On the relationship between sequence and structural similarity for proteins that are not obviously related in sequence

Comparison of Folding Rates of Homologous Prokaryotic and Eukaryotic Proteins

"Topohydrophobic positions" as key markers of globular protein folds

Twilight zone of protein sequence alignments.

Analysis of two incompletely spliced Arabidopsis cDNAs encoding novel types of peroxidase

Linkers of secondary structures in proteins.