Abstract

Identifying the unknown transcription factor binding sites (TFBSs) is a fundamental and important component for understanding gene regulation as well as life mechanisms. The corresponding de novo motif discovery problem in bioinformatics is formulated as pattern discovery from strings, where challenges come from both modeling and optimization, because the short TFBSs are weak signals in massive and noisy experimental data. While genetic algorithms have been widely applied to the problem, recent memetic algorithms (MAs) employing local operators demonstrate the superiority in both effectiveness and efficiency. In this paper, we propose and study various MA components including local operators and models for motif discovery, through the newly established MA framework. The demonstrated optimization and modeling capabilities are analyzed in-depth on real datasets and their noisy versions. Selected optimal MAs show significantly improved performance over state-of-the-art methods in extensive tests including the blind test on the eukaryotic benchmark. This paper serves as the first systematic study of MAs on de novo motif discovery, where important issues are highlighted in the analyses of MA design. The comprehensive component categorization and the MA framework provide a useful platform for future MA developments, especially on the newly emerging chromatin immunoprecipitation followed by sequencing data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.