Abstract

Measuring the distance between two bacterial genomes under the inversion process is usually done by assuming all inversions to occur with equal probability. Recently, an approach to calculating inversion distance using group theory was introduced, and is effective for the model in which only very short inversions occur. In this paper, we show how to use the group-theoretic framework to establish minimal distance for any weighting on the set of inversions, generalizing previous approaches. To do this we use the theory of rewriting systems for groups, and exploit the Knuth–Bendix algorithm, the first time this theory has been introduced into genome rearrangement problems. The central idea of the approach is to use existing group theoretic methods to find an initial path between two genomes in genome space (for instance using only short inversions), and then to deform this path to optimality using a confluent system of rewriting rules generated by the Knuth–Bendix algorithm.

Highlights

  • Large scale changes in the arrangement of genes within a chromosome abound in biology and are key agents of sequence evolution (Belda et al, 2005; Beckmann et al, 2007)

  • The problem of determining a sequence of inversion operations that transforms π into π ′ is equivalent to the problem of expressing the group element π ′π −1 as a product of the group elements corresponding to the rearrangement operators

  • For a finite group, the Knuth-Bendix algorithm will give us the requisite set of rewriting rules. The upshot of this observation is that for a genome rearrangement model, where the rearrangement operators are invertible, the Knuth-Bendix algorithm is guaranteed to generate a finite, confluent, terminating rewriting system since we are dealing with finite groups

Read more

Summary

INTRODUCTION

Large scale changes in the arrangement of genes within a chromosome abound in biology and are key agents of sequence evolution (Belda et al, 2005; Beckmann et al, 2007). If the weight assigned to an inversion event represents the probability of that event, a model where all events have the same weight can be thought of as finding distances under the uniform distribution This model is used in the Hannenhalli and Pevzner (1999) approach, which draws a graph based on the genomes and calculates the minimal distance as a function of features of the graph (for example the number of cycles and paths). Pestis has found that all inversions were shorter than expected under a neutral model (Darling et al, 2008) In view of this information, a natural extension to the definition of rearrangement distance that allows for assigning weights (derived from empirical information) to the rearrangement operators, and calculates the minimal weighted distance between genome arrangements, might be a better approximation of the underlying biology. Throughout the paper, we will focus on determining the minimal weighted reversal distance

Overview of the Framework Introduced in This Paper
GROUP THEORETIC INVERSION SYSTEMS
Genomes as Permutations and Inversion as an Action
Inversion Systems
Weighted Length
GROUP PRESENTATIONS
WORDS ON A CAYLEY GRAPH
REWRITING SYSTEMS
Termination
Confluence
IMPLEMENTATION AND BIOLOGICAL EXAMPLES
DISCUSSION AND FUTURE
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call