Clinical data warehouses provide harmonized access to healthcare data for medical researchers. Informatics for Integrating Biology and the Bedside (i2b2) is a well-established open-source solution with the major benefit that data representations can be tailored to support specific use cases. These data representations can be defined and improved via an iterative approach together with domain experts and the medical researchers using the platform. To facilitate these discussions, it is important to understand how users interact with the system. The objective of this work was to develop metrics for describing user interactions with clinical data warehouses in general and i2b2 in particular. Moreover, we aimed to develop a dashboard featuring interactive visualizations that inform data engineers and data stewards about potential improvements. We first identified metrics for different data usage dimensions and extracted the relevant metadata about previous user queries from the i2b2 database schema for further analysis. We then implemented associated visualizations in Python and integrated the results into an interactive dashboard using Dash. The identified categories of metrics include frequency of use, session duration, and use of functionality and features. We created a dashboard that extends our local i2b2 data warehouse platform, focusing on the latter category, further broken down into the number of queries, frequently queried concepts, and query complexity. The implementation is available as open-source software. A range of metrics can be derived from metadata logged in the i2b2 database schema to provide data engineers and data stewards with a comprehensive understanding of how users interact with the platform. This can help to identify the strengths and limitations of specific instances of the platform for specific use cases and aid their iterative improvement.
Read full abstract