Abstract

Quorum planted (l, d) motif search (qPMS) is a challenging computational problem in bioinformatics, mainly for the identification of regulatory elements such as transcription factor binding sites in DNA sequences. Large DNA datasets play an important role in identifying high-quality (l, d) motifs, while most existing qPMS algorithms are too time-consuming to complete the calculation of qPMS in a reasonable time. We propose an approximate qPMS algorithm called APMS to deal with large DNA datasets mainly by accelerating neighboring substring search and filtering redundant substrings. Experimental results on them show that APMS can not only identify the implanted (l, d) motifs, but also run orders of magnitude faster than the state-of-the-art qPMS algorithms. The source code of APMS and the python wrapper for the code are freely available at https://github.com/qyu071/apms.

Highlights

  • Transcription factors bind with specific sites in DNA sequences to initiate gene transcription and to control the transcription efficiency of genes

  • For a particular transcription factor (TF), there may be multiple transcription factor binding sites (TFBSs) in DNA sequences. These TFBSs are usually similar to each other and share the same sequence pattern called a DNA motif. Each of these TFBSs can be regarded as a motif instance, that is, a conservation occurrence of the motif in DNA sequences

  • In order to efficiently solve the planted motif search on large DNA datasets, we propose an approximate qPMS algorithm called APMS by designing new methods for generating seeds, refining seeds and filtering redundant seeds

Read more

Summary

Introduction

Transcription factors bind with specific sites in DNA sequences to initiate gene transcription and to control the transcription efficiency of genes. These sites, typically from 5 to 20 base pairs (bps) in length, are called transcription factor binding sites (TFBSs). Quorum planted (l, d) motif search (qPMS) [3], [4] is one of the well-known problem descriptions of locating TFBSs in DNA sequences. For a particular transcription factor (TF), there may be multiple TFBSs in DNA sequences. These TFBSs are usually similar to each other and share the same sequence pattern called a DNA motif.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call