Articles published on Data Documentation Initiative
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
35 Search results
Sort by Recency
- Research Article
- 10.3390/a18080490
- Aug 6, 2025
- Algorithms
- Giannis Vassiliou + 4 more
Despite the significant volume of empirical research found in student-authored academic theses—particularly in the social sciences—these works are often poorly documented and difficult to discover within institutional repositories. A key reason for this is the lack of appropriate metadata frameworks that balance descriptive richness with usability. General standards such as Dublin Core are too simplistic to capture critical research details, while more robust models like the Data Documentation Initiative (DDI) are too complex for non-specialist users and not designed for use with student theses. This paper presents the design and validation of a lightweight, web-based metadata framework specifically tailored to document empirical research in academic theses. We are the first to adapt existing hybrid Dublin Core–DDI approaches specifically for thesis documentation, with a novel focus on cross-methodological research and non-expert usability. The model was developed through a structured analysis of actual student theses and refined to support intuitive, structured metadata entry without requiring technical expertise. The resulting system enhances the discoverability, classification, and reuse of empirical theses within institutional repositories, offering a scalable solution to elevate the visibility of the gray literature in higher education.
- Research Article
- 10.1108/ajim-12-2024-0959
- May 29, 2025
- Aslib Journal of Information Management
- Anna Sendra + 2 more
PurposeThe aim of this study is to examine research data management practices among scholars in the social sciences and humanities who engage in data-intensive research. Additionally, the study extends an existing data lifecycle model tailored to these disciplines by incorporating scholars’ perceived needs for research data support services.Design/methodology/approachSemi-structured interviews (n = 21) were conducted with scholars of various levels of experience in data-intensive social sciences and humanities research. A qualitative content analysis focused on research data management practices was applied to the material.FindingsUnmet needs in terms of existing infrastructure (e.g. repositories) and services are affecting the research data management practices in data-intensive social sciences and humanities research, where less common tasks include data sharing and reuse. Based on these perceived requirements, an improved version of the Data Documentation Initiative Lifecycle that includes the support needs required for effectively managing data throughout the research process is developed.Originality/valueThe study contributes to improving the development of research data services aimed at data-intensive social sciences and humanities research by presenting a research activity model that better represents from the perspective of scholars the evolving research data management practices in these disciplines. The study also provides a deeper understanding of the support needs derived from the increasing digitalization of social sciences and humanities research.
- Research Article
- 10.29300/mkt.v9i2.5669
- Dec 25, 2024
- AL Maktabah
- Muhammad Yusrizal + 2 more
Penelitian ini bertujuan untuk mengevaluasi kualitas metadata di dalam repository digital Universitas Islam Negeri (UIN) Fatmawati Soekarno Bengkulu. Seiring dengan semakin banyaknya perpustakaan perguruan tinggi yang mengandalkan repository digital untuk mengelola koleksinya, metadata menjadi sangat penting untuk memberikan informasi dan panduan tambahan bagi pengguna. Penelitian ini bertujuan untuk mengevaluasi metadata yang ada saat ini terhadap standar kualitas pemerintah dan mengidentifikasi bidang-bidang yang perlu ditingkatkan. Studi ini menggunakan standar metadata seperti DublinCore, DataCite, dan Data Documentation Initiative (DDI) sebagai kerangka kerja yang relevan untuk mencapai tujuan tersebut. Metodologi penelitian ini mengikuti konsep Sistem Informasi Kearsipan Terbuka (Open Archival Information System, OAIS) dan menggunakan pendekatan kualitatif. Penelitian ini melibatkan wawancara dan analisis metadata dalam repository untuk mengidentifikasi masalah yang berkaitan dengan konsistensi terminologi dan kelengkapan informasi. Untuk meningkatkan kualitas metadata, pendekatan yang diusulkan mencakup pelatihan bagi pengelola repository, menetapkan kriteria konsistensi metadata, dan menerapkan mekanisme pemantauan. Peningkatan kualitas metadata akan meningkatkan pengalaman pengguna dan merampingkan manajemen koleksi digital. Pendekatan pengembangan penelitian ini dapat menjadi cetak biru bagi organisasi lain yang ingin meningkatkan kualitas metadata dalam repository mereka. Penelitian ini berkontribusi pada literatur administrasi koleksi digital dan merupakan alat yang berharga bagi administrator repository dan peneliti yang tertarik pada produksi metadata.
- Research Article
1
- 10.3389/fdata.2024.1435510
- Oct 11, 2024
- Frontiers in big data
- Bylhah Mugotitsa + 12 more
Longitudinal studies are essential for understanding the progression of mental health disorders over time, but combining data collected through different methods to assess conditions like depression, anxiety, and psychosis presents significant challenges. This study presents a mapping technique allowing for the conversion of diverse longitudinal data into a standardized staging database, leveraging the Data Documentation Initiative (DDI) Lifecycle and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standards to ensure consistency and compatibility across datasets. The "INSPIRE" project integrates longitudinal data from African studies into a staging database using metadata documentation standards structured with a snowflake schema. This facilitates the development of Extraction, Transformation, and Loading (ETL) scripts for integrating data into OMOP CDM. The staging database schema is designed to capture the dynamic nature of longitudinal studies, including changes in research protocols and the use of different instruments across data collection waves. Utilizing this mapping method, we streamlined the data migration process to the staging database, enabling subsequent integration into the OMOP CDM. Adherence to metadata standards ensures data quality, promotes interoperability, and expands opportunities for data sharing in mental health research. The staging database serves as an innovative tool in managing longitudinal mental health data, going beyond simple data hosting to act as a comprehensive study descriptor. It provides detailed insights into each study stage and establishes a data science foundation for standardizing and integrating the data into OMOP CDM.
- Research Article
- 10.2218/ijdc.v18i1.937
- Aug 13, 2024
- International Journal of Digital Curation
- Deirdre Lungley + 2 more
This paper explores an enhanced curation lifecycle being developed at the UK Data Service (UKDS), with our Data Product Builder. Through a Graphical User Interface, we aim to provide the researcher with a tailored digital resource. We detail the threefold motivation behind this initiative: data dissemination scalability, researcher satisfaction and the reduction of nationwide duplication of research effort. Subsequent sections detail the technical components and challenges involved. In addition to more standard data subsetting, filtering and linking components, this data dissemination platform offers dynamic disclosure assessments – identifying combinations of variables that present a potential disclosure risk. All components are underpinned by the Data Documentation Initiative’s new Cross-Domain Integration standard (DDI-CDI), designed to handle the many structures in which data may be organised. Ever conscious of the scale of the task we are embarking on, we remain motivated by the need for such advances in data dissemination and optimistic of the feasibility of such a system to meet the needs of the researcher while balancing the data disclosivity concerns of the data depositor.
- Research Article
4
- 10.2196/56237
- Aug 1, 2024
- Online Journal of Public Health Informatics
- David Amadi + 10 more
BackgroundMetadata describe and provide context for other data, playing a pivotal role in enabling findability, accessibility, interoperability, and reusability (FAIR) data principles. By providing comprehensive and machine-readable descriptions of digital resources, metadata empower both machines and human users to seamlessly discover, access, integrate, and reuse data or content across diverse platforms and applications. However, the limited accessibility and machine-interpretability of existing metadata for population health data hinder effective data discovery and reuse.ObjectiveTo address these challenges, we propose a comprehensive framework using standardized formats, vocabularies, and protocols to render population health data machine-readable, significantly enhancing their FAIRness and enabling seamless discovery, access, and integration across diverse platforms and research applications.MethodsThe framework implements a 3-stage approach. The first stage is Data Documentation Initiative (DDI) integration, which involves leveraging the DDI Codebook metadata and documentation of detailed information for data and associated assets, while ensuring transparency and comprehensiveness. The second stage is Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standardization. In this stage, the data are harmonized and standardized into the OMOP CDM, facilitating unified analysis across heterogeneous data sets. The third stage involves the integration of Schema.org and JavaScript Object Notation for Linked Data (JSON-LD), in which machine-readable metadata are generated using Schema.org entities and embedded within the data using JSON-LD, boosting discoverability and comprehension for both machines and human users. We demonstrated the implementation of these 3 stages using the Integrated Disease Surveillance and Response (IDSR) data from Malawi and Kenya.ResultsThe implementation of our framework significantly enhanced the FAIRness of population health data, resulting in improved discoverability through seamless integration with platforms such as Google Dataset Search. The adoption of standardized formats and protocols streamlined data accessibility and integration across various research environments, fostering collaboration and knowledge sharing. Additionally, the use of machine-interpretable metadata empowered researchers to efficiently reuse data for targeted analyses and insights, thereby maximizing the overall value of population health resources. The JSON-LD codes are accessible via a GitHub repository and the HTML code integrated with JSON-LD is available on the Implementation Network for Sharing Population Information from Research Entities website.ConclusionsThe adoption of machine-readable metadata standards is essential for ensuring the FAIRness of population health data. By embracing these standards, organizations can enhance diverse resource visibility, accessibility, and utility, leading to a broader impact, particularly in low- and middle-income countries. Machine-readable metadata can accelerate research, improve health care decision-making, and ultimately promote better health outcomes for populations worldwide.
- Research Article
- 10.29173/iq1118
- Jun 26, 2024
- IASSIST Quarterly
- Janete Saldanha Bach Estevao + 1 more
The dynamic relationship among survey instruments and study entities like questionnaires, variables, questions, and response formats evolve in Social Sciences surveys. Researchers may need to modify variable attributes such as labels or names, question-wording, or response scales when reusing variables in survey design. Therefore, explaining these relations across different waves and studies is necessary to track how variables relate to each other. Although standards like Data Documentation Initiative – Lifecycle (DDI-LC) and DataCite model these relationships, these frameworks fall short of capturing the complexity of variable relationships. The DDI Alliance Controlled Vocabulary for Commonality Type employs codes—such as 'identical,' 'some,' and 'none'—to outline shifts in entities like variables; however, this approach is insufficient for disambiguating these relationships since they do not differentiate the variable attributes subject to change. We introduce the GESIS Controlled Vocabulary (CV) for Variables in Social Sciences Research Data to bridge this gap. This CV is designed to enhance semantic interoperability across various organizations and systems. Establishing explicit relationships facilitates harmonization across different study waves and enriches data reuse. This enhancement supports advanced search and browse functionalities. The CV, published via the CESSDA vocabulary manager, seeks to forge a semantically rich, interconnected knowledge graph specifically tailored for Social Science Research. This endeavour aligns with the FAIR data principles, aiming to foster a more integrated and accessible research landscape.
- Research Article
1
- 10.1007/978-1-0716-3449-3_18
- Jul 3, 2023
- Methods in molecular biology (Clifton, N.J.)
- Ivan Pribec + 6 more
This chapter discusses the challenges and requirements of modern Research Data Management (RDM), particularly for biomedical applications in the context of high-performance computing (HPC). The FAIR data principles (Findable, Accessible, Interoperable, Reusable) are of special importance. Data formats, publication platforms, annotation schemata, automated data management and staging, the data infrastructure in HPC centers, file transfer and staging methods in HPC, and the EUDAT components are discussed. Tools and approaches for automated data movement and replication in cross-center workflows are explained, as well as the development of ontologies for structuring and quality-checking of metadata in computational biomedicine. The CompBioMed project is used as a real-world example of implementing these principles and tools in practice. The LEXIS project has built a workflow-execution and data management platform that follows the paradigm of HPC-Cloud convergence for demanding Big Data applications. It is used for orchestrating workflows with YORC, utilizing the data documentation initiative (DDI) and distributed computing resources (DCI). The platform is accessed by a user-friendly LEXIS portal for workflow and data management, making HPC and Cloud Computing significantly more accessible. Checkpointing, duplicate runs, and spare images of the data are used to create resilient workflows. The CompBioMed project is completing the implementation of such a workflow, using data replication and brokering, which will enable urgent computing on exascale platforms.
- Front Matter
- 10.29173/iq1086
- Mar 30, 2023
- IASSIST Quarterly
- Karsten Boye Rasmussen
Welcome to the first issue of IASSIST Quarterly for the year 2023 -IQ vol. 47(1). The last article in this issue has in the title the FAIR acronym that stands for Findable, Accessible, Interoperable, and Reusable. These are the concepts most often focused on by our articles in the IQ and FAIR has an extra emphasis in this issue. The first article introduces and demonstrates a shared vocabulary for data points where the need arose after confusions about data and metadata. Basically, I find that the most valuable virtue of well-structured data -I deliberately use a fuzzy term to save you from long excursions here in the editor's notes -is that other well-structured data can benefit from use of the same software. Similarly, well-structured metadata can benefit from the same software. I also see this as the driver for the second article, on time series data and description. Sometimes, the software mentioned is the same software in both instances as metadata is treated as data or vice versa. This allows for new levels of data-driven machine actions. These days universities are busy investigating and discussing the latest chatbots. I find many of the approaches restrictive and prefer to support the inclusive ones. Likewise, I also expect and look forward to bots having great relevance for the future implementation of FAIR principles. 3/3 Rasmussen, Karsten Boye (2023) Editor's notes: FAIR BOT. As metadata is data is metadata is data ..., IASSIST Quarterly 47(1), pp. 1-2.
- Research Article
- 10.29173/iq1051
- Mar 30, 2023
- IASSIST Quarterly
- George Alter + 2 more
Sharing data across scientific domains is often impeded by differences in the language used to describe data and metadata. We argue that disagreements over the boundary between data and metadata are a common source of confusion. Information appearing as data in one domain may be considered metadata in another domain, a process that we call “semantic transposition.” To promote greater understanding, we develop new terminology for describing how data and metadata are structured, and we show how it can be applied to a variety of widely used data formats. Our approach builds upon previous work, such as the Observations and Measurements (ISO 19156) data model. We rely on tools from the Data Documentation Initiative’s Cross Domain Integration (DDI-CDI) to illustrate how the same data can be represented in different ways, and how information considered data in one format can become metadata in another format.
- Research Article
- 10.29173/iq1038
- Mar 30, 2023
- IASSIST Quarterly
- Dan Gillman + 1 more
The US Bureau of Labor Statistics (BLS) is undertaking initiatives to improve its data and metadata systems. Planning for the replacement of the public facing LABSTAT data query system and efforts within the Office of Productivity and Technology to combine multiple production systems within a single cross-divisional database platform are examples. BLS views time-series data as a combination of three elemental components found in every time-series. A measure element; a person, places, and things element; and a time element are the components. The authors turned this basic approach into a formal conceptual model represented in UML (Unified Modeling Language). The UML model describes a flexible multi-dimensional data structure, of which time-series are a kind, and supports any kind of query into the data. The Office of Productivity and Technology has adopted the model, and it is guiding their approach moving forward. The model was also adopted by the Financial Industry Business Ontology project under the Object Management Group and by the Data Documentation Initiative Cross-Domain Integration (DDI-CDI) development project. There are other similarities between the OPT effort and DDI-CDI as well. In this way, the OPT project demonstrates the feasibility and usefulness of many of the ideas in DDI-CDI. In this paper we describe the time-series formulation and the UML conceptual model. Then, the design of the OPT system and some of its features are described, relating those that are like DDI-CDI where appropriate. In doing so, we provide a thorough understanding of the structure of time-series.
- Research Article
2
- 10.18438/eblip29913
- Sep 15, 2021
- Evidence Based Library and Information Practice
- Shawn W Nicholson + 1 more
Objective – This study uses quantitative methods to determine if the metadata requirements of institutional repositories (IRs) promote data discovery. This question is addressed through an exploration of an international sample of university IRs, including an analysis of the required metadata elements for data deposit, with a particular focus on how these metadata support discovery of research data objects. Methods – The researchers worked with an international universe of 243 IRs. A codebook of 10 variables was developed to enable analysis of the eventual randomly derived sample of 40 institutions. Results – The analysis of our sample IRs revealed that most had metadata standards that offered weak support for data discovery—an unsurprising revelation in view of the fact that university IRs are meant to accommodate deposit and storage of all types of scholarly outputs, only a small percentage of which are research data objects. Most IRs seem to have adopted metadata standards based on the Dublin Core schema, while none of the IRs in our sample used the Data Documentation Initiative metadata that is better suited for deposit and discovery of research datasets. Conclusion – The study demonstrates that while data deposit can be accommodated by the existing metadata requirements of multi-purpose IRs, their metadata practices do little to prioritize data deposit or to promote data discovery. Evidence indicates that data discovery will benefit from additional metadata elements.
- Research Article
- 10.29173/iq984
- Sep 23, 2020
- IASSIST Quarterly
- Karsten Rasmussen
Knowing what to do and how to do it: High transparency and careful curation of data and metadata
- Research Article
- 10.29173/iq922
- Jul 18, 2018
- IASSIST Quarterly
- Karsten Boye Rasmussen
Metadata is key - the most important data after data
- Research Article
- 10.29173/iq924
- Jul 18, 2018
- IASSIST Quarterly
- Benjamin Peuch
Belgium has recently decided to integrate the Consortium of European Social Science Data Archives (CESSDA). The Social Sciences Data Archive (SODA) project aims at tackling the different challenges entailed by the setting up of a new research infrastructure in the form of a data archive. The SODA project involves an archival institution, the State Archives of Belgium, which, like most other large archival repositories around the world, work with Encoded Archival Description (EAD) for managing their metadata. There exists at the State Archives a large pipeline of programs and procedures that processes EAD documents and channels their content through different applications, such as the online catalog of the institution. Because there is a chance that the future Belgian data archive will be part of the State Archives and because DDI is the most widespread metadata standard in the social sciences as well as a requirement for joining CESSDA, the State Archives have developed a DDI-to-EAD crosswalk in order to re-use the State Archives' infrastructure for the needs of the future Belgian service provider. Technical illustrations highlight the conceptual differences between DDI and EAD and how these can be reconciled or escaped for the purpose of a data archive for the social sciences.
- Research Article
3
- 10.29173/iq783
- Feb 24, 2017
- IASSIST Quarterly
- Chifundo Kanjala + 7 more
Open-access for existing LMIC demographic surveillance data using DDI
- Research Article
1
- 10.1002/pra2.2017.14505401049
- Jan 1, 2017
- Proceedings of the Association for Information Science and Technology
- Rachel D Williams + 2 more
ABSTRACTThis paper uses boundary work theory (Gieryn, 1983, among others) to analyze the differences in organizational participation in the creation and maintenance of the Data Documentation Initiative (DDI) standard. The Data Documentation Initiative is a global consensus standard created to describe social science research data. Specifically, the paper addresses how two key social science data archives (SSDA) – the Interuniversity Consortium of Political and Social Research (ICPSR) and the United Kingdom Data Archive (UKDA) – mobilized their differences as resources to create DDI. This paper describes the collaborative activities of the two organizations. It also analyzes how those differences in collaboration resulted in boundary or “juncture” activities and the role those activities played in organizational maintenance. Our study compares how one organization, ICPSR, engaged in translating and aligning activities related to the development of the standard, while the other data archive, UKDA, engaged in decentering activities. The paper uses this case of standards work to reflect on the role of boundaries as resources for organizational resilience over the long‐term.
- Research Article
- 10.3233/sji-161011
- Nov 15, 2016
- Statistical Journal of the IAOS
- Haitham Zeidan + 1 more
This paper discusses the experience of the Palestinian Central Bureau of Statistics (PCBS) (1) in designing a user- friendly framework for metadata and microdata documentation. The PCBS uses two metadata specifications: the Data Docu- mentation Initiative (DDI) (2) and the Dublin Core Metadata Initiative (DCMI) (3). Both are defined in the Extensible Mark-up Language (XML) and the Resource Description Framework (RDF). This paper focuses also on the DDI and DCMI as well as its relationship to other relevant metadata standards (e.g., The Statistical Data and Metadata Exchange (SDMX)) (4 )a nd the semantic web technologies. We address the features of these standards as Richer content, Coverage, On-line analytical capability, Search capability and Interoperability since these standards are defined in the Extensible Mark-up Language (XML).
- Research Article
1
- 10.29173/iq898
- Dec 11, 2015
- IASSIST Quarterly
- Thomas Bosch + 1 more
Use Cases Related to an Ontology of the Data Documentation Initiative
- Research Article
15
- 10.5334/jophd.ai
- Sep 22, 2014
- Open Health Data
- Barry T Radler
Midlife in the United States (MIDUS) is a national longitudinal study of health and well-being (http://midus.wisc.edu/). It was conceived by a multidisciplinary team of scholars interested in understanding aging as an integrated bio-psycho-social process, and as such it includes data collected in a wide array of research protocols using a variety of survey and non-survey instruments. The data captured by these different protocols (comprising around 20,000 variables) represent survey measures, cognitive assessments, daily stress diaries, clinical, biomarker and neuroscience data which are contained in separate flat or stacked data files with a common ID system that allows easy data merges among them. All MIDUS datasets and documentation are archived at the ICPSR (http://www.icpsr.umich.edu/) repository at the University of Michigan and are publicly available in a variety of formats and statistical packages. Special attention is given to providing clear user-friendly documentation; the study has embraced the Data Documentation Initiative (DDI) metadata standard and produces DDI-Lifecycle compliant codebooks. Potential for secondary use of MIDUS is high and actively encouraged. The study has become very popular with the research public as measured by data downloads and citation counts (see Reuse Potential below).