Abstract

Dynamic programming is a classical algorithmic paradigm, which often allows the evaluation of a search space of exponential size in polynomial time. Recursive problem decomposition, tabulation of intermediate results for re-use, and Bellman’s Principle of Optimality are its well-understood ingredients. However, algorithms often lack abstraction and are difficult to implement, tedious to debug, and delicate to modify. The present article proposes a generic framework for specifying dynamic programming problems. This framework can handle all kinds of sequential inputs, as well as tree-structured data. Biosequence analysis, document processing, molecular structure analysis, comparison of objects assembled in a hierarchic fashion, and generally, all domains come under consideration where strings and ordered, rooted trees serve as natural data representations. The new approach introduces inverse coupled rewrite systems. They describe the solutions of combinatorial optimization problems as the inverse image of a term rewrite relation that reduces problem solutions to problem inputs. This specification leads to concise yet translucent specifications of dynamic programming algorithms. Their actual implementation may be challenging, but eventually, as we hope, it can be produced automatically. The present article demonstrates the scope of this new approach by describing a diverse set of dynamic programming problems which arise in the domain of computational biology, with examples in biosequence and molecular structure analysis.

Highlights

  • Mapping from concrete to abstract is always the easier way

  • As for the problem of structural matching, with Inverse Coupled Rewrite Systems (ICOREs) S2SGENERIC in Subsection 6.6, we have seen that fixing the first input to a target structure, disregarding base pair insertion, and internalization leads to ICORE COVARIANCEMODEL R, which exhibits the architecture of covariance models

  • The ICORE of the devil’s advocate would be illegal. (It is interesting to note the similarity of this argument to the discussion of the “yield parsing paradox” in [13], where Bellman’s Principle comes in to explain why we cannot solve all problems in classical Algebraic dynamic programming (ADP) in O(n3), in spite of the Chomsky Normal Form transformation, which only seems to apply to all problems in the classical ADP framework.)

Read more

Summary

Motivation

In the field of biosequence analysis, combinatorial optimization problems on sequences and trees arise in never-ending variety. For determining similarity in genes and proteins, there is the “Needleman-Wunsch” alignment algorithm, refered to as “string edit distance” in the broader field of computer science [2,3] It is used with a variety of scoring schemes that differ in their treatment of matches and mismatches, in their modeling of gaps, and by either minimizing distance or maximizing similarity. While there is much re-use of algorithmic ideas in combinatorial optimization problems on trees and sequences, this is not transparent in the way we represent concrete algorithms Their formulation as dynamic programming algorithms requires us to integrate all problem aspects–construction of the search space, scoring, tabulation of intermediate results, and reporting one or more solutions. It would be advisable to experiment with different approaches, but the high implementation effort prevents this

Overview
Previous Work
Article Organization
A Motivating Example
Machinery
Rewrite Systems
Rewriting Modulo Associativity
Tree Grammars
Algebras
ICORE Definitions
ICORE Pseudocode
An ICORE Exercise
Why Rewriting the Wrong Way?
ICOREs for Sequence Analysis
Standard Affine Gap Model
Variants of the Affine Gap Model
From Global Comparison to Local Search
Motif Searching
Semi-Global Alignment
Local Alignment
Approximate Motif Matching and HMMs
Approximate Matching
From Approximate Matching to Profile HMMs
Satellite Signatures for RNA Sequences and Structures
RNA Folding
Structural Alignment
Core Signature and Grammar for Structural Alignment
Structural Alignment in the Prism of Tree Alignment
Structural Alignment without Given Structures
Exact Consensus Structure
The Sankoff Algorithm
Sequence-to-Structure Alignment and RNA Family Modeling
A Generic Structure Matcher
Generic Exact Structure Matching
From Generic to Hard-Coded Structure Matching
A Structural Matcher for Exact Search
A Structural Matcher for Covariance Models
Conclusion on ICOREs for RNA Analysis
Tree Comparison and Related Problems
Notations and Signatures
Tree Alignment
Classical Tree Alignment
Tree Alignment with Affine Gap Scoring
Classical Tree Edit Distance
Variations on Tree Edit Distance and Alignment of Trees
Tree Alignment under a Generalized Edit Model
Evaluation Algebras and Their Products
Relation with Other Formal Models
Letter Transducers Seen as ICOREs
Multi-Tape S-Attribute Grammars
Relations between Turing Machines and ICOREs
A Trade-off between Grammar and Rewrite Rules
10. Conclusions
10.1. ICOREs as a Declarative Specification Framework
10.2. Research Challenges in ICORE Implementation
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.