Abstract

For all research data collected, data descriptions and information about the corresponding variables are essential for data analysis and reuse. To enable cross-study comparisons and analyses, semantic interoperability of metadata is one of the most important requirements. In the area of clinical and epidemiological studies, data collection instruments such as case report forms (CRFs), data dictionaries and questionnaires are critical for metadata collection. Even though data collection instruments are often created in a digital form, they are mostly not machine readable; i.e., they are not semantically coded. As a result, the comparison between data collection instruments is complex. The German project NFDI4Health is dedicated to the development of national research data infrastructure for personal health data, and as such searches for ways to enhance semantic interoperability. Retrospective integration of semantic codes into study metadata is important, as ongoing or completed studies contain valuable information. However, this is labor intensive and should be eased by software. To understand the market and find out what techniques and technologies support retrospective semantic annotation/enrichment of metadata, we conducted a literature review. In NFDI4Health, we identified basic requirements for semantic metadata annotation software in the biomedical field and in the context of the FAIR principles. Ten relevant software systems were summarized and aligned with those requirements. We concluded that despite active research on semantic annotation systems, no system meets all requirements. Consequently, further research and software development in this area is needed, as interoperability of data dictionaries, questionnaires and data collection tools is key to reusing and combining results from independent research studies.

Highlights

  • Accepted: 30 December 2021A central focus of research data management is the representation of data

  • We describe the challenges for a semantic metadata annotation service and the basic requirements that derive from the health and epidemiology domain, especially from the NFDI4Health project

  • Each of the ten metadata annotation services we found is described in detail

Read more

Summary

Introduction

A central focus of research data management is the representation of data. This concerns the readability of the data, especially the semantic information. This should be unambiguous to humans and to computers. For conducting clinical and epidemiological studies, for example, case report forms, data dictionaries and questionnaires play an important role. They are used either to collect information or to document existing data. They are created by people so that they can be consumed by peers. The data analysis is more complex than necessary and data integration is a major issue

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call