Abstract

Knowledge extraction from genomic data is important activity for the biologist. In order to mine the underlying biological knowledge, we based on the Knowledge Discovery in Databases KDD process. In this paper, we transformed DNA sequences into texts: the text indexed by TF-IDF and n-grams approach. In the aim of grouping the similar DNA sequences, we applied the bio-inspired 3D cellular automata for clustering method. For the analysis of clustering results we based on the transformation of each DNA sequence into amino acids sequence; according to the standard genetic code, we concluded that the clusters help the biologists to select DNA sequences that can produce a type of medicament molecule and their various derivatives low concentration in their composition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call