IntroductionWith the advent of artificial intelligence, the secondary use of routinely collected medical data from electronic healthcare records (EHR) has become increasingly popular. However, different EHR systems typically use different names for the same medical concepts. This obviously hampers scalable model development and subsequent clinical implementation for decision support. Therefore, converting original parameter names to a so-called ontology, a standardized set of predefined concepts, is necessary but time-consuming and labor-intensive. We therefore propose an augmented intelligence approach to facilitate ontology alignment by predicting correct concepts based on parameter names from raw electronic health record data exports. MethodsWe used the manually mapped parameter names from the multicenter “Dutch ICU data warehouse against COVID-19” sourced from three types of EHR systems to train machine learning models for concept mapping. Data from 29 intensive care units on 38,824 parameters mapped to 1,679 relevant and unique concepts and 38,069 parameters labeled as irrelevant were used for model development and validation. We used the Natural Language Toolkit (NLTK) to preprocess the parameter names based on WordNet cognitive synonyms transformed by term-frequency inverse document frequency (TF-IDF), yielding numeric features. We then trained linear classifiers using stochastic gradient descent for multi-class prediction. Finally, we fine-tuned these predictions using information on distributions of the data associated with each parameter name through similarity score and skewness comparisons. ResultsThe initial model, trained using data from one hospital organization for each of three EHR systems, scored an overall top 1 precision of 0.744, recall of 0.771, and F1-score of 0.737 on a total of 58,804 parameters. Leave-one-hospital-out analysis returned an average top 1 recall of 0.680 for relevant parameters, which increased to 0.905 for the top 5 predictions. When reducing the training dataset to only include relevant parameters, top 1 recall was 0.811 and top 5 recall was 0.914 for relevant parameters. Performance improvement based on similarity score or skewness comparisons affected at most 5.23% of numeric parameters. ConclusionAugmented intelligence is a promising method to improve concept mapping of parameter names from raw electronic health record data exports. We propose a robust method for mapping data across various domains, facilitating the integration of diverse data sources. However, recall is not perfect, and therefore manual validation of mapping remains essential.