Abstract

Many modern functions and systems represent and exchange data in tree-structured form and process and produce large tree datasets. Discovering informative patterns in large tree datasets is an important research area that has many practical applications. We propose a novel approach that exploits efficient homomorphic pattern matching algorithms to compute pattern support incrementally and avoids the costly enumeration of all patterns matching required by previous approaches. To reduce space consumption, matching information of already computed patterns is materialized as bitmaps. We further optimize our basic support computation method by designing an algorithm which incrementally generates the bitmaps of the embeddings of a new candidate pattern without first explicitly computing the embeddings of this pattern. Our extensive experimental results on real and synthetic large-tree datasets show that our approach displays orders of magnitude performance improvements over a state-of-the-art tree mining algorithm and a recent graph mining algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call