The Kanta Patient Data Repository (PDR) contains healthcare data from the population of Finland for more than a decade. The repository is a continuously expanding real world dataset produced by many information systems and healthcare service providers. Kanta data has been available for secondary uses such as scientific research since 2019. The data can be requested from the Finnish authority Findata. However, before a request has been accepted, it is difficult to assess if the accumulated data allows answering a specific research question. Publicly available descriptions of data structures in the Kanta PDR do not tell how much they are used in practice. This publication enables future data use cases by providing a view on the overall availability of types of structured health data in the Kanta PDR based on a sample of 96 200 medical histories of over 18-year-old patients. We conclude that the Kanta PDR is a promising source of real world data for development and evaluation of medical risk calculators within the Finnish population. The wide coverage of the Finnish population and timeliness of the data are its strengths as a source of research data also outside of Finnish context. However, the limitations on data availability in variable level need to be considered on a case-by-case basis. Main challenges in the use of data in the Kanta PDR are multiple code systems for laboratory results, short durations of recorded data for specific data types, and missing or very rarely used structured format e.g., in cases of tobacco and alcohol use.
Read full abstract