Abstract

Mass spectrometry-driven BLAST (MS BLAST) is a database search protocol for identifying unknown proteins by sequence similarity to homologous proteins available in a database. MS BLAST utilizes redundant, degenerate, and partially inaccurate peptide sequence data obtained by de novo interpretation of tandem mass spectra and has become a powerful tool in functional proteomic research. Using computational modeling, we evaluated the potential of MS BLAST for proteome-wide identification of unknown proteins. We determined how the success rate of protein identification depends on the full-length sequence identity between the queried protein and its closest homologue in a database. We also estimated phylogenetic distances between organisms under study and related reference organisms with completely sequenced genomes that allow substantial coverage of unknown proteomes.

Highlights

  • Mass spectrometry-driven BLAST (MS BLAST) is a database search protocol for identifying unknown proteins by sequence similarity to homologous proteins available in a database

  • Sensitivity of MS BLAST Identification—We were interested in the MS BLAST performance in cross-species protein identification with peptide queries produced by the interpretation of tandem mass spectra (MS queries)

  • We investigated the relationship between the rate of truepositive, false-negative, and false-positive identifications by MS BLAST, overall sequence identity between homologous proteins and the number of peptides in a query (Fig. 2)

Read more

Summary

EXPERIMENTAL PROCEDURES

Computer Simulation Experiments—The WU-BLAST2 program [24] was installed on a local server. Homologues of queried proteins in the neighboring proteomes were determined by WU-BLAST2 searches performed under standard settings (substitution matrix BLOSUM62, Expect cutoff 1) [21, 24] using their full-length sequences, and hits with E-values lower than 1E-05 were fetched from the output by a special sorting script. To simulate MS BLAST queries, peptide sequences of 10 amino acid residues were randomly selected from proteins and merged into search strings. 5, 8, 10, 15, and 20 unique peptides were assembled from peptide sequences from S. cerevisiae and C. albicans, and queries containing 3, 8, and 15 unique peptides were assembled from S. pombe proteins and from the three vertebrate species. Thresholds were calculated by performing 5,000 MS BLAST searches for each query composed from a given number of peptide sequences. Queries were assembled from 10 amino acid residues peptides, which were obtained by five independent rounds of random selection from 1,000 unique proteins. The evolutionary distance between species was calculated using the program dnadist from the Phylip package [33]

RESULTS AND DISCUSSION
56 LFVSFLXRAL 65
CONCLUSION AND PERSPECTIVES
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call