Abstract

Finding motif in biosequences is the most important primitive operation in computational biology. There are many computational requirements for a motif discovery algorithm such as computer memory space requirement and computational complexity. To overcome the complexity of motif discovery, we propose an alternative solution integrating genetic algorithm and Fuzzy Art machine learning approaches for eliminating multiple sequence alignment process. Problem statement: More than a hundred methods had been proposed for motif discovery in recent years, representing a large variation with respect to both algorithmic approaches as well as the underlying models of regulatory regions. The aim of this study was to develop an alternative solution for motif discovery, which benefits from both data mining and genetic algorithm, and which at the same time eliminates the cost caused by use of multiple sequence alignment. Approach: Genetic algorithm based probabilistic Motif discovery model was designed to solve the problem. The proposed algorithm was implemented using Matlab and also tested with large DNA sequence data sets and synthetic data sets. Results: Results obtained by the proposed model to find the motif in terms of speed and length are compared with the existing method. Our proposed method finds Length of 11 in 18 sec and length of 15 in 24 sec but the existing methods finds length of 11 in 34 sec. Compare to other techniques the proposed one was outperforms the popular existing method. Conclusion: In this study, we proposed a model to discover motif in large set of unaligned sequences in considerably minimum time. Length of motif was also long. The proposed algorithm will be implemented using Matlab and was tested with large DNA sequence data sets and synthetic data sets.

Highlights

  • MATERIALS AND METHODSModern growths in bioinformatics have stimulated many researchers’ attention to this area

  • Motif discovery basically can be described as follows: for a given sample of sequences can we find the unknown pattern that is implanted in different positions of the given sequences[1] Importance of these patterns for biology comes from the role of motifs at protein DNA binding sites

  • The aim of this study is to develop an alternative solution for motif discovery, which benefits from both data mining and genetic algorithm, and which at the same time eliminates the cost caused by use of multiple sequence alignment

Read more

Summary

Introduction

MATERIALS AND METHODSModern growths in bioinformatics have stimulated many researchers’ attention to this area. Biologists, computer scientists, and others from various fields have contributed different researches planning to benefit more from biological data. Motif discovery is one of those benefits of biological data, and naturally it is amongst fashionable bioinformatics topics. Numerous studies were done to discover solutions for motif discovery. Stine et al.[5] employed genetic algorithm in their structured Genetic Algorithm (StGA) to search and to discover highly conserved motifs amongst upstream sequences of co-regulated genes. Liu et al.[6] employed genetic algorithm for finding potential motifs in the regions of Transcription Start Site (TSS). Pan et al.[7] developed MacosFSpan and MacosVSpan algorithms to mine maximal frequent sequences in biological data. While MacosFSpan and MacosVSpan underline inefficiency of apriori-like algorithms, and seeks a mining solution that works better in biological datasets[6,7], combine genetic

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call