Abstract

Knowledge graphs are important for industrial digitalization. Industrial knowledge graphs are often mapped from multiple existing large data sources, and creating a mapping requires the time of scarce subject matter experts (SME). Interactive, literal programming for large scale mapping would allow mapping engineers to make good use of SME time, and document their work. Currently, there are no open source tools supporting such a process. To solve this problem, we implement maplib, which leverages existing tooling from data science. In data science, there is widespread use of literate programming using frameworks such as Jupyter notebooks to interactively prepare data and create analyses using in-memory tables called DataFrames. Maplib is implemented in Rust using Polars DataFrames and has Python bindings, allowing us to leverage tooling used in data science. Maplib implements the OTTR mapping language, which is highly suited for industrial use cases. Maplib features a SPARQL engine defined directly on DataFrames, making querying possible immediately after mapping. We evaluate our approach by comparing mapping and querying performance with Morph-KGC and SPARQL Anything on the GTFS Madrid benchmark. Our approach materializes the graph and is ready to query 47x-182x faster, and scales to models that are over twice as large. Morph-KGC and SPARQL Anything perform better for most, but not all of the queries once the graph has been constructed. On the end-to-end task of mapping and querying however, which is very important for interactive mapping, maplib performs better for most queries.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call