A Sequential Method for Discovering Probabilistic Motifs in Proteins

A Likas,D I Fotiadis,K Blekas

doi:10.1055/s-0038-1633414

Abstract

This paper proposes a greedy algorithm for learning a mixture of motifs model through likelihood maximization, in order to discover common substrings, known as motifs, from a given collection of related biosequences. The approach sequentially adds a new motif component to a mixture model by performing a combined scheme of global and local search for appropriately initializing the component parameters. A hierarchical clustering scheme is also applied initially which leads to the identification of candidate motif models and speeds up the global searching procedure. The performance of the proposed algorithm has been studied in both artificial and real biological datasets. In comparison with the well-known MEME approach, the algorithm is advantageous since it identifies motifs with significant conservation and produces larger protein fingerprints. The proposed greedy algorithm constitutes a promising approach for discovering multiple probabilistic motifs in biological sequences. By using an effective incremental mixture modeling strategy, our technique manages to successfully overcome the limitation of the MEME scheme which erases motif occurrences each time a new motif is discovered.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Sequential Method for Discovering Probabilistic Motifs in Proteins

Abstract

Talk to us

Similar Papers

More From: Methods of Information in Medicine

Lead the way for us

Journal: Methods of Information in Medicine	Publication Date: Jan 1, 2004
Citations: 8

Similar Papers

Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm.
I Rigoutsos ... A Floratos
Bioinformatics | VOL. 14
I Rigoutsos, et. al.I Rigoutsos ... A Floratos
01 Jan 1998
Bioinformatics | VOL. 14

Identifying discriminative classification-based motifs in biological sequences
Celine Vens ... Etienne G J Danchin
Bioinformatics | VOL. 27
Celine Vens, et. al.Celine Vens ... Etienne G J Danchin
03 Mar 2011
Bioinformatics | VOL. 27

Suffix tree characterization of maximal motifs in biological sequences
Maria Federico ... Nadia Pisanti
Theoretical Computer Science | VOL. 410
Maria Federico, et. al.Maria Federico ... Nadia Pisanti
16 Jul 2009
Theoretical Computer Science | VOL. 410

HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences
Thanh Le ... Katheleen Gardiner
Bioinformatics | VOL. 26
Thanh Le, et. al.Thanh Le ... Katheleen Gardiner
08 Dec 2009
Bioinformatics | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Sequential Method for Discovering Probabilistic Motifs in Proteins

Abstract

Talk to us

Similar Papers

More From: Methods of Information in Medicine