One of the most difficult challenges in molecular biology as well as computer science is finding patterns in DNA sequences. Identification of regulatory motifs is critical for understanding the gene expression. The essential concept in gene expression is that each a gene encodes the instructions for making a protein. The process of expression begins with the binding of several recognised protein factors. As transcription factors, they bind to enhancer and promoter sequences (Li and Li, 2019). Transcription is the first stage, which involves creating an RNA "copy" of a section of the DNA. This RNA sequence is read and interpreted to create a protein in the second stage of the process, known as translation. Gene expression is the combined result of these two actions. Numerous regulatory transcription factors (TFs), also known as Transcription Factor Binding Sites (TFBS), bind to certain DNA regions to control gene expression. In the past ten years, a significant new method for understanding transcription regulation networks has emerged: the computational identification of TFBS through the study of DNA sequence data (Ruzicka et al., 2017). Finding sequence motifs can be challenging since intergenic regions are extremely long and highly varied, while sequence motifs are small (approximately 6–12 bp). Sequence motifs are frequently repeated and conserved, and they have a fixed size. These patterns are critical for identifying Transcription Factor Binding Sites (TF-BSs), which aids in understanding the mechanisms governing gene expression 3. Motifs can be classified as planted, structured, gapped, sequence, network, and motifs (Hashim et al., 2019). An important issue in computational biology is the finding of weak motifs. It is challenging to solve because there are so many inconsistencies between the actual theme and its altered variants that false signals may mask the real ones. Further, it is challenging to identify and uncover regulatory elements using computer algorithms since they are typically brief and varied. The task of solving the theme finding problem is that of discovering overrepresented motifs as well as conserved motifs from the set of DNA sequences that are good candidates for becoming sites where transcription factors bind. Transcription factor is a protein that functions as a gene expression regulator, specifically regulating the start of the transcription process that produces mRNA using DNA as a template. The common sequence is called a motif. A "pattern" in a transcription factor's binding sites. Finding motifs will aid in the development of illness therapies and comprehension disease susceptibility (Mohanty and Mohanty, 2020). Many techniques for analysing gene function start with the finding of a DNA motif. Finding Transcription Factor Binding Sites (TFBSs), which aid in understanding the mechanisms for controlling gene expression, is a crucial part of motif discovery. The development of quick and precise motif discovery technologies has utilised a variety of algorithms over the years. These algorithms are typically categorised as probabilistic or consensus techniques, and many of them take a lot of time to run and are prone to get stuck in local optimums. Recently, solutions to these issues have been offered using both nature-inspired algorithms and a variety of combinatorial algorithms (Hashim et al., 2019).
Read full abstract