Abstract

For large scale structural assignment to sequences, as in computational structural genomics, a fast yet sensitive homology search procedure is essential. A new approach using intermediate sequences was tested as a shortcut to iterative multiple sequence search methods such as PSI-BLAST and hidden Markov models. A library containing potential intermediate sequences for proteins of known structure (PDB_ISL) was constructed. The sequences in the library were collected from a large sequence database using the sequences of the domains of proteins of known structure as the query sequences and the program PSI-BLAST. Sequences of proteins of unknown structure can be matched to distantly related proteins of known structure by using any pairwise sequence comparison methods to find homologues in PDB_ISL. Searches of PDB_ISL were calibrated, and the number of correct matches found at a given error rate was the same as that found by PSI_BLAST. The advantage of this library is that it uses pairwise sequence comparison methods, such as FASTA or BLAST2, and can, therefore, be searched easily and, in many cases, much more quickly than an iterative multiple sequence comparison method. The procedure is roughly twenty times faster than PSI-BLAST for small genomes and several hundred times for large genomes such as C. elegans.Sequences can be submitted to the PDB_ISL servers athttp://stash.mrc-lmb.cam.ac.uk/PDB_ISL/ftp://ftp.ebi.ac.uk/pub/databases/pdb_isl/

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.