Towards a comprehensive collection of diagnostic patterns for protein sequence classification

Björn Olsson,Kim Laurio

doi:10.1016/s0020-0255(02)00171-8

Abstract

The PROSITE collection of patterns for family classification of protein sequences requires much manual labour for motif finding and pattern updating, and yet has only moderate classification accuracy . Out of 1026 families with patterns in PROSITE release 16.0, there was only 523 (51%) with a diagnostic pattern, i.e., a pattern which discriminates perfectly between family and non-family sequences in the training set. Therefore, there is a need to find reliable methods for automating the processes of motif-finding and pattern construction, so that improved speed can be combined with greater classification accuracy. In this paper we present our approach to automating the construction of a collection of patterns, and we announce release 1.0 of the pattern collection built by motif-finding by analysis of multiple alignments (MAMA). MAMA is found to improve the classification accuracy over PROSITE by finding many more diagnostic patterns. On 926 tested families, MAMA finds such patterns for 771 (83%). Furthermore, both the average specificity and sensitivity of MAMA patterns are found to be higher than for PROSITE. A WWW interface that allows users to submit sequences and scan for matches in the MAMA pattern collection is available, 1 Located at http://www.his.se/ida/mama . 1 together with a listing of all the patterns in MAMA release 1.0.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Towards a comprehensive collection of diagnostic patterns for protein sequence classification

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Journal: Information Sciences	Publication Date: Apr 7, 2002
Citations: 3

Similar Papers

Data Mining of Protein Sequences with Amino Acid Position-Based Feature Encoding Technique
Muhammad Javed Iqbal ... Brahim Belhaouari Samir
-
Muhammad Javed Iqbal, et. al.Muhammad Javed Iqbal ... Brahim Belhaouari Samir
15 Dec 2013
15 Dec 2013

Computational Technique for an Efficient Classification of Protein Sequences With Distance‐Based Sequence Encoding Algorithm
Muhammad Javed Iqbal ... Ibrahima Faye
Computational Intelligence | VOL. 33
Muhammad Javed Iqbal, et. al.Muhammad Javed Iqbal ... Ibrahima Faye
21 Sep 2015
Computational Intelligence | VOL. 33

Predictive analysis for pathogenicity classification of H5Nx avian influenza strains using machine learning techniques.
Akshay Chadha ... Zvonimir Poljak
Preventive Veterinary Medicine | VOL. 216
Akshay Chadha, et. al.Akshay Chadha ... Zvonimir Poljak
01 Jul 2023
Preventive Veterinary Medicine | VOL. 216

An efficient computational intelligence technique for classification of protein sequences
Muhammad Javed Iqbal ... Brahim Belhaouari Samir
-
Muhammad Javed Iqbal, et. al.Muhammad Javed Iqbal ... Brahim Belhaouari Samir
01 Jun 2014
01 Jun 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards a comprehensive collection of diagnostic patterns for protein sequence classification

Abstract

Talk to us

Similar Papers

More From: Information Sciences