A data mining approach based on machine learning techniques to classify biological sequences

M Maddouri,M Elloumi

doi:10.1016/s0950-7051(01)00143-5

Abstract

In molecular biology, biological macromolecules, like desoxyribonucleic acids (DNA) and proteins are coded by strings, called ‘primary structures’. For a long time, biologists gathered these primary structures in large databases. Now, they focus on analyzing these primary structures in order to extract useful knowledge. Data mining approaches can be helpful to reach this goal. In this paper, we present a data mining approach based on machine learning techniques to do classification of biological sequences. By using our approach, we use four steps as follows. (1) In the first step, we construct the set of the discriminant substrings, called discriminant descriptor (DD), associated with each family of primary structures. This construction is made thanks to an adaptation of the Karp, Miller and Rosenberg (KMR) algorithm. (2) In the second step, we use the DDs constructed during the first step to code the families of primary structures by a table of examples vs attributes, called ‘context’. (3) In the third step, we extract knowledge from the context constructed during the second step and represent it by production rules. This extraction is made by using an incremental production rules approach. (4) Finally, during the last step, we use the obtained production rules to do classification of primary structures.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A data mining approach based on machine learning techniques to classify biological sequences

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Journal: Knowledge-Based Systems	Publication Date: Feb 23, 2002
Citations: 30

Similar Papers

New voting strategies designed for the classification of nucleic sequences
Mourad Elloumi ... Mondher Maddouri
Knowledge and Information Systems | VOL. 8
Mourad Elloumi, et. al.Mourad Elloumi ... Mondher Maddouri
01 Jul 2005
Knowledge and Information Systems | VOL. 8

Deep Learning for Taxonomic Classification of Biological Bacterial Sequences
Marwah A Helaly ... Sherine Rady
-
Marwah A Helaly, et. al.Marwah A Helaly ... Sherine Rady
15 Dec 2020
15 Dec 2020

A data mining approach to predict protein secondary structure
Yang Bingru ... Wang Lijun
-
Yang Bingru, et. al.Yang Bingru ... Wang Lijun
01 Oct 2010
01 Oct 2010

Biological Sequence Classification with Multivariate String Kernels
Pavel P Kuksa
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 10
Pavel P KuksaPavel P Kuksa
01 Sep 2013
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A data mining approach based on machine learning techniques to classify biological sequences

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems