Interactive optimization of embedding-based text similarity calculations

Daniel Witschard,Andreas Kerren,Kostiantyn Kucher,Ilir Jusufi,Rafael M Martins

doi:10.1177/14738716221114372

Daniel Witschard, Andreas Kerren + Show 3 more

Open Access

https://doi.org/10.1177/14738716221114372

Copy DOI

Abstract

Comparing text documents is an essential task for a variety of applications within diverse research fields, and several different methods have been developed for this. However, calculating text similarity is an ambiguous and context-dependent task, so many open challenges still exist. In this paper, we present a novel method for text similarity calculations based on the combination of embedding technology and ensemble methods. By using several embeddings, instead of only one, we show that it is possible to achieve higher quality, which in turn is a key factor for developing high-performing applications for text similarity exploitation. We also provide a prototype visual analytics tool which helps the analyst to find optimal performing ensembles and gain insights to the inner workings of the similarity calculations. Furthermore, we discuss the generalizability of our key ideas to fields beyond the scope of text analysis.

Full Text