Abstract

BackgroundBLAST is a commonly-used software package for comparing a query sequence to a database of known sequences; in this study, we focus on protein sequences. Position-specific-iterated BLAST (PSI-BLAST) iteratively searches a protein sequence database, using the matches in round i to construct a position-specific score matrix (PSSM) for searching the database in round i + 1. Biegert and Söding developed Context-sensitive BLAST (CS-BLAST), which combines information from searching the sequence database with information derived from a library of short protein profiles to achieve better homology detection than PSI-BLAST, which builds its PSSMs from scratch.ResultsWe describe a new method, called domain enhanced lookup time accelerated BLAST (DELTA-BLAST), which searches a database of pre-constructed PSSMs before searching a protein-sequence database, to yield better homology detection. For its PSSMs, DELTA-BLAST employs a subset of NCBI’s Conserved Domain Database (CDD). On a test set derived from ASTRAL, with one round of searching, DELTA-BLAST achieves a ROC5000 of 0.270 vs. 0.116 for CS-BLAST. The performance advantage diminishes in iterated searches, but DELTA-BLAST continues to achieve better ROC scores than CS-BLAST.ConclusionsDELTA-BLAST is a useful program for the detection of remote protein homologs. It is available under the “Protein BLAST” link at http://blast.ncbi.nlm.nih.gov.ReviewersThis article was reviewed by Arcady Mushegian, Nick V. Grishin, and Frank Eisenhaber.

Highlights

  • BLAST is a commonly-used software package for comparing a query sequence to a database of known sequences; in this study, we focus on protein sequences

  • We describe Domain Enhanced Look-up Time Accelerated BLAST (DELTA-BLAST), a new tool that first uses RPS-BLAST to align a query sequence to conserved domains in Conserved Domain Database (CDD), and performs a sequence database search using a position-specific score matrix (PSSM) derived from the aligned domains

  • This section compares the performance of BLASTP, CSBLAST, PSI-BLAST, and DELTA-BLAST

Read more

Summary

Introduction

BLAST is a commonly-used software package for comparing a query sequence to a database of known sequences; in this study, we focus on protein sequences. Biegert and Söding developed Context-sensitive BLAST (CS-BLAST), which combines information from searching the sequence database with information derived from a library of short protein profiles to achieve better homology detection than PSI-BLAST, which builds its PSSMs from scratch. A PSSM is constructed from a multiple sequence alignment (MSA) of related proteins, and models the amino acid substitutions particular to a specific protein family and sequence position. Biegert and Söding [24] developed Context-Specific BLAST (CS-BLAST), which computes an initial PSSM using a query sequence and a library of short profiles To construct this library, the authors first construct a large number of MSAs by aligning subsets of sequences from the whole nonredundant protein database (NR) [25] with one another, using two iterations of PSI-BLAST.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call