UniqueProt: Creating representative protein sequence sets.

Sven Mika

doi:10.1093/nar/gkg620

UniqueProt: Creating representative protein sequence sets.

Sven Mika

Open Access

https://doi.org/10.1093/nar/gkg620

Copy DOI

Journal: Nucleic acids research	Publication Date: Jul 1, 2003
Citations: 133

Affiliation: Columbia University

#Representative Sets #Data Sets Of Protein Sequences + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

UniqueProt is a practical and easy to use web service designed to create representative, unbiased data sets of protein sequences. The largest possible representative sets are found through a simple greedy algorithm using the HSSP-value to establish sequence similarity. UniqueProt is not a real clustering program in the sense that the 'representatives' are not at the centres of well-defined clusters since the definition of such clusters is problem-specific. Overall, UniqueProt is a reasonable fast solution for bias in data sets. The service is accessible at http://cubic.bioc.columbia.edu/services/uniqueprot; a command-line version for Linux is downloadable from this web site.

Full Text