AbstractData quality issues can significantly hinder research reproducibility, data sharing, and reuse. At the forefront of addressing data quality issues are research data repositories (RDRs). This study conducted a systematic analysis of data quality assurance (DQA) practices in RDRs, guided by activity theory and data quality literature, resulting in conceptualizing a data quality assurance model (DQAM) for RDRs. DQAM outlines a DQA process comprising evaluation, intervention, and communication activities and categorizes 17 quality dimensions into intrinsic and product‐level data quality. It also details specific improvement actions for data products and identifies the essential roles, skills, standards, and tools for DQA in RDRs. By comparing DQAM with existing DQA models, the study highlights its potential to improve these models by adding a specific DQA activity structure. The theoretical implication of the study is a systematic conceptualization of DQA work in RDRs that is grounded in a comprehensive analysis of the literature and offers a refined conceptualization of DQA integration into broader frameworks of RDR evaluation. In practice, DQAM can inform the design and development of DQA workflows and tools. As a future research direction, the study suggests applying and evaluating DQAM across various domains to validate and refine this model further.
Read full abstract