Abstract

Tree comparisons are used in various areas with various statistical or dissimilarity measures. Given that data in various domains are diverse, and a particular comparison approach could be more appropriate for specific applications, there is a need to evaluate different comparison approaches. As gathering real data is often an extensive task, using generated trees provides a faster evaluation of the proposed solutions. This paper presents three algorithms for generating random trees: parametrized by tree size, shape based on the node distribution and the amount of difference between generated trees. The motivation for the algorithms came from unordered trees that are created from class hierarchies in object-oriented programs. The presented algorithms are evaluated by statistical and dissimilarity measures to observe stability, behavior, and impact on node distribution. The results in the case of dissimilarity measures evaluation show that the algorithms are suitable for tree comparison.

Highlights

  • Trees as data structures are used to represent hierarchically organized data in various areas, such as pattern recognition [1,2], phylogenetic studies [3], structured documents [4], and problem-solving by searching [5]

  • We briefly describe tree comparison based on the dissimilarity measures and tree characteristics such as height and degree, followed by the elaboration of trees’ statistical data relevant for use in the presented algorithms

  • The contributions of this paper are three algorithms for generating unordered trees: random, using a given node distribution corresponding to the data in the problem domain, and trees created by the modification of the existing tree

Read more

Summary

Introduction

Trees as data structures are used to represent hierarchically organized data in various areas, such as pattern recognition [1,2], phylogenetic studies [3], structured documents [4], and problem-solving by searching [5]. One of the key tasks is to compare trees, e.g., by tree characteristics and dissimilarity measures Tree characteristics such as tree height and node degree, and conclusions based on these characteristics, such as branching factor [5], can provide insight into the relationships in the data. We briefly describe tree comparison based on the dissimilarity measures and tree characteristics such as height and degree, followed by the elaboration of trees’ statistical data relevant for use in the presented algorithms. The contributions of this paper are three algorithms for generating unordered trees: random, using a given node distribution corresponding to the data in the problem domain, and trees created by the modification of the existing tree.

Dissimilarity Measures
Statistical Measures
Random Variants
Generating Trees by Node Distribution
Experiments
Statistical and Edit Distance Measurements
Distortion Parameters
Node Distribution
Related Work
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call