An interdisciplinary comparison of sequence modeling methods for next-element prediction

Niek Tax,Sebastiaan J Van Zelst,Irene Teinemaa

doi:10.1007/s10270-020-00789-3

Niek Tax, Sebastiaan J Van Zelst + Show 1 more

Open Access

PDF Available

https://doi.org/10.1007/s10270-020-00789-3

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Data of sequential nature arise in many application domains in the form of, e.g., textual data, DNA sequences, and software execution traces. Different research disciplines have developed methods to learn sequence models from such datasets: (i) In the machine learning field methods such as (hidden) Markov models and recurrent neural networks have been developed and successfully applied to a wide range of tasks, (ii) in process mining process discovery methods aim to generate human-interpretable descriptive models, and (iii) in the grammar inference field the focus is on finding descriptive models in the form of formal grammars. Despite their different focuses, these fields share a common goal: learning a model that accurately captures the sequential behavior in the underlying data. Those sequence models are generative, i.e., they are able to predict what elements are likely to occur after a given incomplete sequence. So far, these fields have developed mainly in isolation from each other and no comparison exists. This paper presents an interdisciplinary experimental evaluation that compares sequence modeling methods on the task of next-element prediction on four real-life sequence datasets. The results indicate that machine learning methods, which generally do not aim at model interpretability, tend to outperform methods from the process mining and grammar inference fields in terms of accuracy.

Full Text