Abstract

Alignments, i.e., position-wise comparisons of two or more strings or ordered lists are of utmost practical importance in computational biology and a host of other fields, including historical linguistics and emerging areas of research in the Digital Humanities. The problem is well-known to be computationally hard as soon as the number of input strings is not bounded. Due to its practical importance, a huge number of heuristics have been devised, which have proved very successful in a wide range of applications. Alignments nevertheless have received hardly any attention as formal, mathematical structures. Here, we focus on the compositional aspects of alignments, which underlie most algorithmic approaches to computing alignments. We also show that the concepts naturally generalize to finite partially ordered sets and partial maps between them that in some sense preserve the partial orders. As a consequence of this discussion we observe that alignments of even more general structure, in particular graphs, are essentially characterized by the fact that the restriction of alignments to a row must coincide with the corresponding input graphs. Pairwise alignments of graphs are therefore determined completely by common induced subgraphs. In this setting alignments of alignments are well-defined, and alignments can be decomposed recursively into subalignments. This provides a general framework within which different classes of alignment algorithms can be explored for objects very different from sequences and other totally ordered data structures.

Highlights

  • Alignments play an important role in particular in bioinformatics as a means of comparing two or more strings by explicitly identifying correspondences between letters as well as insertions and deletions [13]

  • Most commonly a scoring model is defined for pairs of sequences and generalized to multiple alignments as sums over certain pairwise alignments that are obtained as projections

  • In this contribution we have analyzed the compositional properties of sequence alignments and explored the generalization to much more general structures

Read more

Summary

Introduction

Alignments play an important role in particular in bioinformatics as a means of comparing two or more strings by explicitly identifying correspondences between letters (usually called matches and mismatches) as well as insertions and deletions [13]. The pairwise scoring is usually specified either in terms of matches or in terms of edit operations (insertions, deletions, or substitutions). In this contribution, we will almost completely disregard the scoring of alignments and instead focus on the structure of (multiple) alignments as combinatorial objects. Following a brief discussion of the view of alignments as compositions of pairwise matching relations, we further generalize the formalism to include first ordered trees, directed and undirected graphs, and essentially arbitrary finite spaces that admit well-behaved subspace constructions. We shall conclude that alignments are alternatively specified in terms by common induced subgraphs (or the corresponding common induced subspaces in full generality)

A Very Brief Review of Sequence Alignments
Formal Definitions of Sequence Alignments
Alignments of Partially Ordered Sets
Composition of Alignments
Blockwise Decompositions
Recursive Construction
Alignments as Relations
Tree Alignments
10 Alignments of Graphs
11 Alignments for General Structures
12 Concluding Remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call