Abstract

Events prediction in a sequence of events is a challenging task that can be approached with data mining. In this paper, we focus on the specific case of early prediction of distant events. We aim at mining episode rules with a consequent temporally distant from the antecedent and an antecedent as small as possible both in number of events and in occurrence duration. We refer to these rules as essential rules. To reach this goal, we propose an original algorithm, DEER: Distant and Essential Episode Rules. This algorithm differs from traditional algorithms in three points. First, it determines the consequent of episode rules at an early stage in the mining process, which allows to mine rules with an antecedent as small as possible. Second, it applies a minimal gap constraint between the antecedent and the consequent to guarantee a suitable distance between both elements. Third, the stop criterion used considers both the support and the confidence of the rules, at the opposite of traditional algorithms that use only the support.Experiments on both synthetic and real datasets show that DEER runs faster than several algorithms of the state-of-the-art and has also good scalability on large datasets. Furthermore, by studying in details the episode rules mined from a real dataset of blog messages, we demonstrate not only the efficiency of our algorithm for mining interesting essential rules with a distant consequent, but also that these rules can be used to accurately predict distant events.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.