Discovering Interesting Motif-Sets for Multi-Class Protein Sequence Classification

Patrick C.H Ma,Keith C.C Chan

doi:10.1089/cmb.2008.0213

Abstract

In this article, we propose an effective data mining technique for multi-class protein sequence classification. The technique, which can discover discriminative motif-sets for classification, performs its tasks in two phases. In Phase 1, it makes use of a popular motif discovery algorithm called MEME (Multiple Expectation Maximization for Motif Elicitation) to discover a set of highly conserved motifs in each protein family of training sequences. The highly conserved motif-sets discovered in each family may overlap with each other and may therefore not be unique enough to allow them to be used for classification. Phase 2, therefore, makes use of a pattern discovery approach to discover the interesting motif-sets in each protein family that are useful for classification with a single classifier. Based on these motif-sets, the functional family of each independent testing sequence can then be determined. For experimentation, the proposed technique has been tested with different sets of protein sequences. Experimental results show that it outperforms other existing protein sequence classifiers and can effectively classify proteins into their corresponding functional families. In addition, the motif-sets discovered during the training process have been found to be biologically meaningful.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Discovering Interesting Motif-Sets for Multi-Class Protein Sequence Classification

Abstract

Talk to us

Similar Papers

More From: Journal of computational biology : a journal of computational molecular cell biology

Lead the way for us

Journal: Journal of computational biology : a journal of computational molecular cell biology	Publication Date: May 1, 2010
Citations: 1

Similar Papers

An artificial intelligence approach to motif discovery in protein sequences: application to steriod dehydrogenases.
Timothy L Bailey ... Charles P Elkan
Journal of steroid biochemistry | VOL. 62
Timothy L Bailey, et. al.Timothy L Bailey ... Charles P Elkan
01 May 1997
Journal of steroid biochemistry | VOL. 62

An Effective Data Mining Technique for the Multi-Class Protein Sequence Classification
Patrick C H Ma ... Keith C C Chan
-
Patrick C H Ma, et. al.Patrick C H Ma ... Keith C C Chan
01 May 2008
01 May 2008

SPLASH: structural pattern localization analysis by sequential histograms.
Andrea Califano
Computer applications in the biosciences : CABIOS | VOL. 16
Andrea CalifanoAndrea Califano
01 Apr 2000
Computer applications in the biosciences : CABIOS | VOL. 16

Protein Sequences Classification Using Modular RBF Neural Networks
Dianhui Wang ... N.K Lee
-
Dianhui Wang, et. al.Dianhui Wang ... N.K Lee
01 Jan 2002
01 Jan 2002

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Discovering Interesting Motif-Sets for Multi-Class Protein Sequence Classification

Abstract

Talk to us

Similar Papers

More From: Journal of computational biology : a journal of computational molecular cell biology