Abstract

The Sequence Search Algorithm Assessment and Testing Toolkit (SAT) aims to be a complete package for the comparison of different protein homology search algorithms. The structural classification of proteins can provide us with a clear criterion for judgment in homology detection. There have been several assessments based on structural sequences with classifications but a good deal of similar work is now being repeated with locally developed procedures and programs. The SAT will provide developers with a complete package which will save time and produce more comparable performance assessments for search algorithms. The package is complete in the sense that it provides a non-redundant large sequence resource database, a well-characterized query database of proteins domains, all the parsers and some previous results from PSI-BLAST and a hidden markov model algorithm. An analysis on two different data sets was carried out using the SAT package. It compared the performance of a full protein sequence database (RSDB100) with a non-redundant representative sequence database derived from it (RSDB50). The performance measurement indicated that the full database is sub-optimal for a homology search. This result justifies the use of much smaller and faster RSDB50 than RSDB100 for the SAT. A web site is up. The whole packa ge is accessible via www and ftp. ftp://ftp.ebi.ac.uk/pub/contrib/jong/SAT http://cyrah.ebi.ac.uk:1111/Proj/Bio/SAT http://www.mrc-lmb.cam.ac.uk/genomes/SAT In the package, some previous assessment results produced by the package can also be found for reference. jong@ebi.ac.uk

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call