Genetic Algorithm Based Probabilistic Motif Discovery in Unaligned Biological Sequences

M Hemalatha,K Vivekanand

doi:10.3844/jcssp.2008.625.630

Abstract

Finding motif in biosequences is the most important primitive operation in computational biology. There are many computational requirements for a motif discovery algorithm such as computer memory space requirement and computational complexity. To overcome the complexity of motif discovery, we propose an alternative solution integrating genetic algorithm and Fuzzy Art machine learning approaches for eliminating multiple sequence alignment process. Problem statement: More than a hundred methods had been proposed for motif discovery in recent years, representing a large variation with respect to both algorithmic approaches as well as the underlying models of regulatory regions. The aim of this study was to develop an alternative solution for motif discovery, which benefits from both data mining and genetic algorithm, and which at the same time eliminates the cost caused by use of multiple sequence alignment. Approach: Genetic algorithm based probabilistic Motif discovery model was designed to solve the problem. The proposed algorithm was implemented using Matlab and also tested with large DNA sequence data sets and synthetic data sets. Results: Results obtained by the proposed model to find the motif in terms of speed and length are compared with the existing method. Our proposed method finds Length of 11 in 18 sec and length of 15 in 24 sec but the existing methods finds length of 11 in 34 sec. Compare to other techniques the proposed one was outperforms the popular existing method. Conclusion: In this study, we proposed a model to discover motif in large set of unaligned sequences in considerably minimum time. Length of motif was also long. The proposed algorithm will be implemented using Matlab and was tested with large DNA sequence data sets and synthetic data sets.

Highlights

MATERIALS AND METHODSModern growths in bioinformatics have stimulated many researchers’ attention to this area
Motif discovery basically can be described as follows: for a given sample of sequences can we find the unknown pattern that is implanted in different positions of the given sequences[1] Importance of these patterns for biology comes from the role of motifs at protein DNA binding sites
The aim of this study is to develop an alternative solution for motif discovery, which benefits from both data mining and genetic algorithm, and which at the same time eliminates the cost caused by use of multiple sequence alignment

Summary

Introduction

MATERIALS AND METHODSModern growths in bioinformatics have stimulated many researchers’ attention to this area. Biologists, computer scientists, and others from various fields have contributed different researches planning to benefit more from biological data. Motif discovery is one of those benefits of biological data, and naturally it is amongst fashionable bioinformatics topics. Numerous studies were done to discover solutions for motif discovery. Stine et al.[5] employed genetic algorithm in their structured Genetic Algorithm (StGA) to search and to discover highly conserved motifs amongst upstream sequences of co-regulated genes. Liu et al.[6] employed genetic algorithm for finding potential motifs in the regions of Transcription Start Site (TSS). Pan et al.[7] developed MacosFSpan and MacosVSpan algorithms to mine maximal frequent sequences in biological data. While MacosFSpan and MacosVSpan underline inefficiency of apriori-like algorithms, and seeks a mining solution that works better in biological datasets[6,7], combine genetic

Objectives

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Computer Science	Publication Date: Aug 1, 2008
Citations: 12	License type: cc-by

R Discovery Prime

R Discovery Prime

Genetic Algorithm Based Probabilistic Motif Discovery in Unaligned Biological Sequences

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computer Science

Lead the way for us

Similar Papers

Suffix tree searcher: exploration of common substrings in large DNA sequence sets.
David Minkley ... Chris Kelly
BMC research notes | VOL. 7
David Minkley, et. al.David Minkley ... Chris Kelly
23 Jul 2014
BMC research notes | VOL. 7

An efficient exact algorithm for planted motif search on large DNA sequence datasets.
Qiang Yu ... Xinnan Hu
IEEE/ACM transactions on computational biology and bioinformatics | VOL. PP
Qiang Yu, et. al.Qiang Yu ... Xinnan Hu
01 Jan 2024
IEEE/ACM transactions on computational biology and bioinformatics | VOL. PP

Data from Molecular Evolutionary Analysis of Cancer Cell Lines
Wendy S Halsey ... Joanna D Holbrook
-
Wendy S Halsey, et. al.Wendy S Halsey ... Joanna D Holbrook
31 Mar 2023
31 Mar 2023

Molecular Evolutionary Analysis of Cancer Cell Lines
Yan Zhang ... James R Brown
Molecular Cancer Therapeutics | VOL. 9
Yan Zhang, et. al.Yan Zhang ... James R Brown
01 Feb 2010
Molecular Cancer Therapeutics | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Genetic Algorithm Based Probabilistic Motif Discovery in Unaligned Biological Sequences

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computer Science