Comparison of preprocessing approaches for text data in digital shop floor management systems

Marvin Müller,Lukas Longard,Joachim Metternich

doi:10.1016/j.procir.2022.04.030

Abstract

In an increasing number of production companies shop floor management (SFM) is supported by digital systems. The data generated while working with these systems can be used for assistance systems to further enhance the value of digital SFM. Several assistance systems using text data from problem-solving processes have been suggested but had limited quality due to the domain specific language characteristics: short texts with spelling errors and the usage of synonyms. This research aims to quantify the improvement potentials of different preprocessing approaches on the quality of the assistance systems. For that and for comparison in the research community a public, labeled data set is needed. This paper introduces such a data set based on the characteristics identified in three real industry data sets. To overcome the problems in text processing of shop floor data (e.g. domain specific synonyms), several approaches are suggested, tested, and compared to a generic approach for text clustering. The study identifies best practices for the handling of shop floor text data and provides a data set with the goal of simplifying and stimulating research on this topic.

Full Text