PurposeData quality assurance (DQA) is essential for enabling the sharing and reuse of research data, especially given the increasing focus on data transparency, reproducibility, credibility and validity in research. Although the literature on research data curation is vast, there remains a lack of theory-guided exploration of DQA modeling in research data repositories (RDRs).Design/methodology/approachThis study addresses this gap by examining 12 distinct cases of DQA-related knowledge organization tools, including four metadata vocabularies, three metadata schemas, one ontology and four standards used to guide DQA work in RDRs.FindingsThe study analyzed the cases utilizing a theoretical framework based on activity theory and data quality literature and synthesized a model and a knowledge artifact, a DQA ontology (DQAO, Lee et al., 2024), that encodes a DQA theory for RDRs. The ontology includes 127 classes, 44 object properties, 7 data properties and 18 instances. The article also uses problem scenarios to illustrate how the DQAO can be integrated into the FAIR ecosystem.Originality/valueThe study provides valuable insights into DQA theory and practice in RDRs and offers a DQA ontology for designing, evaluating and integrating DQA workflows within RDRs.
Read full abstract