In today’s age of modern information technology, large amounts of data are generated every second to enable subsequent data aggregation and analysis. However, the IT infrastructures that have been set up over the last few decades and which should now be used for this purpose are very heterogeneous and complex. As a result, tasks for analyzing data, such as collecting, searching, understanding and processing data, become very time-consuming. This makes it difficult to realize visions, such as the Internet of Production, which pursues the goal of guaranteeing the availability of real-time information at any time and place in an industrial setting. To reduce the time to analytics in such scenarios, we present a data ingestion, integration and processing approach consisting of a flexible and configurable data ingestion pipeline as well as a semantic data platform named ESKAPE. The ingestion pipeline provides an abstraction to all tasks related to data acquisition. The main goal is, therefore, the controllable access to data and meta information contained in machines and other systems on the shop floor. Additionally, it provides the possibility to forward the collected data to a configurable endpoint, such as a data lake. ESKAPE acts as one of those endpoints enabling semantic data integration and processing. By annotating data sets with semantic models originating from the Semantic Web, data analysts are able to understand, process and discover these data sets more efficiently. ESKAPE features a three-layered information storage architecture consisting of a data layer for storing integrated raw data sets, a layer containing user-defined semantic models to describe the contextual knowledge necessary to interpret the stored data and a top layer formed by a continuously evolving knowledge graph, combining semantic information from all present semantic models. Based on this storage system, ESKAPE enables the flexible annotation as well as efficient search and processing of data sources without losing the ability of analyzing and querying the underlying raw data with analytic tools. We present and discuss our approach and its benefits and limitations based on a real-world industrial use case.
Read full abstract