Abstract

The PROSITE collection of patterns for family classification of protein sequences requires much manual labour for motif finding and pattern updating, and yet has only moderate classification accuracy . Out of 1026 families with patterns in PROSITE release 16.0, there was only 523 (51%) with a diagnostic pattern, i.e., a pattern which discriminates perfectly between family and non-family sequences in the training set. Therefore, there is a need to find reliable methods for automating the processes of motif-finding and pattern construction, so that improved speed can be combined with greater classification accuracy. In this paper we present our approach to automating the construction of a collection of patterns, and we announce release 1.0 of the pattern collection built by motif-finding by analysis of multiple alignments (MAMA). MAMA is found to improve the classification accuracy over PROSITE by finding many more diagnostic patterns. On 926 tested families, MAMA finds such patterns for 771 (83%). Furthermore, both the average specificity and sensitivity of MAMA patterns are found to be higher than for PROSITE. A WWW interface that allows users to submit sequences and scan for matches in the MAMA pattern collection is available, 1 Located at http://www.his.se/ida/mama . 1 together with a listing of all the patterns in MAMA release 1.0.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.