Abstract

This paper addresses the problem of timestamped event sequence matching, a new type of similar sequence matching that retrieves the occurrences of interesting patterns from timestamped sequence databases. The sequential-scan-based method, the trie-based method, and the method based on the iso-depth index are well-known approaches to this problem. In this paper, we point out their shortcomings, and propose a new method that effectively overcomes these shortcomings. The proposed method employs an R ∗-tree, a widely accepted multi-dimensional index structure that efficiently supports timestamped event sequence matching. To build the R ∗-tree, this method extracts time windows from every item in a timestamped event sequence and represents them as rectangles in n-dimensional space by considering the first and last occurring times of each event type. Here, n is the total number of disparate event types that may occur in a target application. To resolve the dimensionality curse in the case when n is large, we suggest an algorithm for reducing the dimensionality by grouping the event types. Our sequence matching method based on the R ∗-tree performs with two steps. First, it efficiently identifies a small number of candidates by searching the R ∗-tree. Second, it picks out true answers from the set of candidates. We prove its robustness formally, and also show its effectiveness via extensive experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.