Abstract

With the advances in the genome area, new techniques and automation processes for DNA sequencing, the amount of data produced has increased exponentially. Analyzing this data, in order to identify interesting biological features, is an enormous challenge, especially if it would be done manually. Think about trying to find a specific word in a book, say Don Quixote, and we have to search word by word. How long it would take? Bioinformatics has played an important role trying to help specialists to analyze data of a specific genome. The application of information technology, associated with techniques from applied mathematics, informatics, statistics, and computer science, has allowed the discovering of interesting and important characteristics in genomes, allowing to understand and solve several biological problems, or even to generate more knowledge or insight about the problem and its involved biological processes, what can bring advances in the used techniques. In Computing area, for example, an ordinary type of task is to process texts. There are several problems involving strings, like trying to find a specific word (we could say “to align words”) or a similar one (considering a particular pattern of characters) in a text. When processing genomic data, if it is desired to search for a specific pattern (and its approximations) in DNA sequences, the natural way is to use solutions already implemented. Thus, for pattern (exact or not) search and similar problems, bioinformaticians have developed computational tools that apply techniques and algorithms well-known in Computing area in order to solve these important genomic problems. Sometimes, they need to adapt algorithms for considering specific features of the biological problem. Two good examples of this case are Sequence Aligning and Sequence Assembly, processes resulting of adaptations in algorithms in order to consider insertion, deletion, and substitution of nucleotides in DNA sequences. Some statistical and computational techniques, such as Hidden Markov Models (HMMs), Stochastic Grammars, and Conditional Random Fields (CRFs) have been successfully applied for modeling, analysis, discovery, classification, and alignment of biological sequences (Yoon & Vaidyanathan, 2004, 2005). HMMs (Rabiner, 1989) and Stochastic Grammars (Sakakibara et al., 1994) are forms of generative models to label sequences, assigning a joint probability distribution of, for example, the gene hidden structure y and the 7

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.