Motif discovery in unaligned DNA sequences using genetic algorithm

Al Muttakin,Mohammad Rezwanul Huq

doi:10.1109/icaee.2017.8255450

Abstract

Motif discovery in unaligned DNA sequences has become a challenging problem in computer science and molecular biology. Finding a cluster of numerous similar subsequences in a set of biopolymer sequences is evidence that the subsequences occur not by chance but because they share some biological function. Motifs can be used to determine evolutionary and functional relationships. Over the past few decades, many motif discovery algorithms have been designed and developed into tools that become available to public. In this paper, we represent an algorithm on motif discovery developed using Genetic Algorithm (GA). In our approach, we search for potential Motifs from a group of DNA sequences of transcription start site (TSS). The Genetic operations such as mutation, crossover is performed with the help of position weight matrix generated from a set of matched sequences. A rearrangement method is used to reduce the chances of a local stable motif being selected over a global stable motif. A preprocessing function is used to relate randomly generated initial motifs with the promoter sequences and a discursion function is used to minimize the computational time. We evaluated our result based on a fitness score and occurrence frequency of a candidate motif in a group of promoter sequence. Our approach gives better result than Finding Motif by Genetic Algorithm (FMGA) which itself showed superior result with comparison to two other Motif finding algorithm namely Multiple Em for motif Elicitation (MEME) and Gibbs Sampler.

Full Text