Smart process repositories have been developed to effectively manage large collections of business process models. Several key features of such repositories require accurate matching of corresponding activities between a pair of process models. However, the existing matching techniques have not yet achieved the desired accuracy, which impedes the effectiveness of process repositories. To that end, this paper proposes a word-embedding-based approach that significantly improves the accuracy of matching. For a comprehensive evaluation of the proposed approach, we have performed experiments using three state-of-the-art word embeddings, two syntactic measures, six semantic measures, and four datasets. The results show that the use of word embeddings outperforms all the syntactic as well as the semantic similarity measures. Moreover, the use of fastText-based embeddings in our proposed technique achieves the highest F1 score, compared to both Word2vec- and GloVe-based embeddings.
Read full abstract