Abstract

Rooted trees are ubiquitous data structures which are used to model hierarchical objects from a plethora of different application domains. For various downstream analysis tasks, measures are needed that quantify (dis-)similarity between rooted trees. Many such measures exist, e. g., the widely used tree edit distance (TED). However, there are few algorithms to compute (dis-)similarity measures which are specifically designed for rooted, unordered, node-labeled trees and support input trees of different orders. To close this gap in the literature, we introduce the edge-preservation similarity (EPS). We show how to exactly compute EPS via integer quadratic programming on small instances and present a scalable 4-approximation algorithm. An evaluation on tree representations of pseudoknotted RNA secondary structures and acyclic molecular graphs shows that both exact and approximate (normalized) EPS better preserves functional similarities between the compared RNAs and molecules than the often-used TED. Python implementations of our algorithms and scripts to reproduce the results are available on GitHub: https://github.com/bionetslab/edge-preservation-similarity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call