Machine-Actionability and Evolvability in Data Stewardship Planning: Framework, Implementation, and Case Study

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Machine-Actionability and Evolvability in Data Stewardship Planning: Framework, Implementation, and Case Study

Similar Papers
  • Research Article
  • Cite Count Icon 1
  • 10.1002/pra2.89
The convergence of computational and social approaches for unveiling meaningful and valuable data
  • Jan 1, 2019
  • Proceedings of the Association for Information Science and Technology
  • Angela P Murillo + 3 more

The current data paradigm is seeking a more integrated and comprehensive framework to make sense of data and its derived issues. From the perspective of the data life cycle, we argue that computational and social approaches complement each other to confront data challenges. Computational approaches consist of ETL (extract, transform, and load), modeling, and machine learning techniques; social approaches include policy and regulations, data sharing and reuse behavior, reproducibility, ethical and privacy issues. In this panel, we frame these two approaches as data acumen and data stewardship . The merging of these two perspectives allows data not only to become discoverable, accessible, and interoperable, but also to further the value of revealing meaningful patterns and become supportive evidence for important decision making. In this panel, the opening facilitator and three panelists will report on their recent studies in terms of this convergence of both data acumen and stewardship while sharing their recent research insights on case studies in three disciplines: agriculture, biomedicine, and archeology.

  • Research Article
  • Cite Count Icon 24
  • 10.1016/j.giq.2021.101642
Data-driven government: Cross-case comparison of data stewardship in data ecosystems
  • Dec 30, 2021
  • Government Information Quarterly
  • W Van Donge + 2 more

Data-driven government: Cross-case comparison of data stewardship in data ecosystems

  • Research Article
  • Cite Count Icon 3
  • 10.1108/jd-12-2023-0264
Data stewardship: case studies from North American, Dutch and Finnish universities
  • Sep 27, 2024
  • Journal of Documentation
  • Antti Mikael Rousi + 2 more

PurposeAs national legislation, federated national services, institutional policies and institutional research service arrangements may differ, data stewardship programs may be organized differently in higher education institutions across the world. This work seeks to elaborate the picture of different data stewardship programs running in different institutional and national research environments.Design/methodology/approachUtilizing a case study design, this study described three distinct data stewardship programs from Purdue University (United States), Delft Technical University (Netherlands) and Aalto University (Finland). In addition, this work investigated the institutional and national research environments of the programs. The focus was on initiatives led by academic libraries or similar services.FindingsThis work demonstrates that data stewardship programs may be organized differently within varying national and institutional contexts. The data stewardship programs varied in terms of roles, organization and funding structures. Furthermore, policies and legislation, organizational structures and national infrastructures differed.Research limitations/implicationsThe data stewardship programs and their contexts develop, and the descriptions presented in this work should be considered as snapshots.Originality/valueThis work broadens the current literature on data stewardship by not only providing detailed descriptions of three distinct data stewardship programs but also highlighting how research environments may affect their organization. We present a summary of key factors in the organization of data stewardship programs.

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.comtox.2021.100190
FAIRification of nanosafety data to improve applicability of (Q)SAR approaches: A case study on in vitro Comet assay genotoxicity data
  • Sep 21, 2021
  • Computational Toxicology
  • Cecilia Bossa + 7 more

(Quantitative) structure-activity relationship ([Q]SAR) methodologies are widely applied to predict the (eco)toxicological effects of chemicals, and their use is envisaged in different regulatory frameworks for filling data gaps of untested substances. However, their application to the risk assessment of nanomaterials is still limited, also due to the scarcity of large and curated experimental datasets. Despite a great amount of nanosafety data having been produced over the last decade in international collaborative initiatives, their interpretation, integration and reuse has been hampered by several obstacles, such as poorly described (meta)data, non-standard terminology, lack of harmonized reporting formats and criteria.Recently, the FAIR (Findable, Accessible, Interoperable, and Reusable) principles have been established to guide the scientific community in good data management and stewardship. The EU H2020 Gov4Nano project, together with other international projects and initiatives, is addressing the challenge of improving nanosafety data FAIRness, for maximizing their availability, understanding, exchange and ultimately their reuse. These efforts are largely supported by the creation of a common Nanosafety Data Interface, which connects a row of project-specific databases applying the eNanoMapper data model. A wide variety of experimental data relating to characterization and effects of nanomaterials are stored in the database; however, the methods, protocols and parameters driving their generation are not fully mature. This article reports the progress of an ongoing case study in the Gov4nano project on the reuse of in vitro Comet genotoxicity data, focusing on the issues and challenges encountered in their FAIRification through the eNanoMapper data model. The case study is part of an iterative process in which the FAIRification of data supports the understanding of the phenomena underlying their generation and, ultimately, improves their reusability.

  • Research Article
  • Cite Count Icon 7
  • 10.1177/03611981241231804
Practical Application of Digital Twins for Transportation Asset Data Management: Case Example of a Safety Hardware Asset
  • Mar 18, 2024
  • Transportation Research Record: Journal of the Transportation Research Board
  • Ashtarout Ammar + 4 more

Digital Twins have emerged as a transformative technology with immense potential for enhancing asset data management in various industries. The transportation industry relies heavily on properly functioning and maintaining safety hardware assets, such as crash barriers, guiderails, and road signs, among others, to ensure the safety of road users. However, traditional asset data management practices for ancillary assets often fall short of providing real-time, comprehensive insights into the condition and performance of these critical assets. Conversely, Digital Twins can analyze data for predictive and preventive maintenance using cutting-edge technologies. This will enable asset managers to proactively address potential safety concerns, plan maintenance schedules efficiently, and optimize resource allocation. This can be achieved by leveraging historical asset data and real-time condition updates and providing the Digital Twin with an information-rich analytical model. The key steps to creating an information-rich analytical model are identifying the asset data that should be collected, understanding data architecture throughout the asset lifecycle, and generating asset data governance and stewardship frameworks. As such, this paper investigates the current state of practice for ancillary asset data collection and management and investigates the generation of information-rich analytical models for guiderails, a critical roadway safety hardware. Moreover, this study demonstrates how Digital Twins can be integrated and affect the development of transportation asset management systems. The findings of this article emphasize the significance of adopting Digital Twins in transportation asset data management. The case example showcases a compelling roadmap for other transportation agencies to consider integrating digital twins into their asset management strategies.

  • Book Chapter
  • Cite Count Icon 3
  • 10.1007/978-3-662-44722-2_32
Rule-Based Impact Analysis for Enterprise Business Intelligence
  • Jan 1, 2014
  • Kalle Tomingas + 2 more

We address several common problems in the field of Business Intelligence, Data Warehousing and Decision Support Systems: the complexity to manage, track and understand data lineage and system component dependencies in long series of data transformation chains. The paper presents practical methods to calculate meaningful data transformation and component dependency paths, based on program parsing, heuristic impact analysis, probabilistic rules and semantic technologies. Case studies are employed to explain further data aggregation and visualization of the results to address different planning and decision support problems for various user profiles like business users, managers, data stewards, system analysts, designers and developers.

  • Research Article
  • 10.54660/.ijmrge.2020.1.1.206-220
Big Data Governance in Enterprise Analytics: Frameworks and Best Practices
  • Jan 1, 2020
  • International Journal of Multidisciplinary Research and Growth Evaluation
  • Adeola Okesiji + 6 more

In the era of data-driven decision-making, the effective governance of big data has become a strategic imperative for enterprises seeking to leverage analytics for competitive advantage. This study investigates the role of big data governance in enhancing enterprise analytics, focusing on the development and implementation of robust governance frameworks and best practices. Big data governance encompasses the policies, processes, standards, and roles required to manage the availability, usability, integrity, and security of data assets. As organizations increasingly rely on complex, high-volume, and diverse data sources, the absence of structured governance can lead to data quality issues, compliance risks, and analytics inefficiencies. This research presents a comprehensive review of existing big data governance models and proposes an integrated framework that aligns governance dimensions with enterprise analytics objectives. The framework emphasizes key pillars such as data ownership, metadata management, data stewardship, quality assurance, privacy, and ethical use of data. Using a qualitative approach supported by case studies across sectors including finance, healthcare, and retail, the study identifies common challenges and success factors in deploying governance structures. Findings highlight the critical role of executive sponsorship, cross-functional collaboration, regulatory alignment, and technology enablers such as data catalogs, AI-driven governance tools, and automated lineage tracking in ensuring governance effectiveness. Furthermore, the study outlines best practices for organizations to operationalize data governance in their analytics lifecycle, including the implementation of data governance councils, standardized data definitions, and continuous monitoring mechanisms. By embedding governance into enterprise analytics workflows, organizations can improve data trustworthiness, accelerate insights, and drive better strategic outcomes. The proposed framework also supports compliance with data protection regulations such as GDPR, HIPAA, and CCPA. This research contributes to both academic discourse and practical applications by bridging the gap between big data governance theory and enterprise analytics execution. It offers actionable guidance for data leaders, CIOs, and analytics teams aiming to enhance data governance maturity and maximize the value of their data assets.

  • Research Article
  • Cite Count Icon 6
  • 10.1162/dint_a_00184
The Impact of COVID-19 and FAIR Data Innovation on Distance Education in Africa
  • Oct 1, 2022
  • Data Intelligence
  • Akinyinka Tosin Akindele + 6 more

Prior to the advent of the COVID-19 pandemic, distance education, a mode of education that allows teaching and learning to occur beyond the walls of traditional classrooms using electronic media and online delivery practices, was not widely embraced as a credible alternative mode of delivering education, especially in Africa. In education, the pandemic, and the measures to contain it, created a need for virtual learning/teaching and showcased the potential of distance education. This article explores the potential of distance education with an emphasis on the role played by COVID-19, the technologies employed, and the benefits, as well as how data stewardship can enhance distance education. It also describes how distance education can make learning opportunities available to the less privileged, geographically displaced, dropouts, housewives, and even workers, enabling them to partake in education while being engaged in other productive aspects of life. A case study is provided on the Dutch Organisation for Internationalisation in Education (NUFFIC) Digital Innovation Skills Hub (DISH) project, which is implemented via distance education and targeted towards marginalised individuals such as refugees and displaced persons in Ethiopia, Somalia, and other conflict zones, aiming to provide them with critical and soft skills for remote work for financial remuneration. This case study shows that distance education is the way forward in education today, as it has the capability to reach millions of learners simultaneously, educating, lifting people out of poverty, and increasing productivity and yields, while ensuring that the world is a better place for future generations.

  • Preprint Article
  • 10.52843/cassyni.42tgbf
Driving Strategic Decisions Using Research Analytics: A Dual Case Study on Data Governance and Management
  • Apr 8, 2025

In an era of increased regulatory demands and complex datasets, research institutions must ensure compliance, security, and strategic decision-making through effective data governance and management. This session will feature two case studies from Saint Louis University (SLU) and the University of Massachusetts Amherst (UMass Amherst), emphasizing innovative approaches to data governance and management for institutional success. The first case study delves into SLU's collaborative efforts to match data management rules with government regulations, highlighting the importance of cloud infrastructure in creating scalable and safe research environments. It describes SLU's implementation of a complete data governance approach to maintain regulatory compliance, including FIPS compliance, while also cultivating a culture of data stewardship. Best practices for protecting sensitive research data, reducing risks, and empowering researchers will be provided. The second case study focuses on UMass Amherst's data-driven approach to global engagement strategy. A consolidated data repository was created to aid institutional decision-making, incorporating a variety of data points such as J-1 visiting scholars, co-publications, memoranda of understanding (MoUs), awards and projects, exchange students, and Fulbright Scholars. The conversation will be on data collecting, cleansing, and effective management processes, with a focus on how these practices help with internationalization, compliance, and strategic alliances. This event will give attendees with practical ways for improving institutional internationalization and research management by using effective data governance and centralized data platforms. The insights shared aim to empower higher education professionals to align their data management practices with institutional goals, regulatory requirements, and global engagement initiatives.

  • Conference Article
  • Cite Count Icon 1
  • 10.1145/3394486.3411072
Models of Data Governance and Advancing Indigenous Genomic Data Sovereignty
  • Aug 20, 2020
  • Krystal S Tsosie

While there has been considerable deliberation about ownership and stewardship of genomic data, as of yet, there does not exist a singular framework that encapsulates the current and future trajectory of how these data governance models can exist for Indigenous communities. We succinctly describe two case studies in the Akimel O'odham (Pima) communities that demonstrate the spectrum of data governance structures, in which tribal members have no input to complete control of data collection and usage. We describe (1) tribal-trust relationships, (2) non-tribal partnerships, and (3) tribally-driven models in context of an Indigenous people whose genomic and health data have been widely misused and exploited by outside researchers and the new narrative in which the O'odham have begun re-asserting their sovereignty in data domains. We demonstrate various strategies or models that communities and researchers can use to discuss data governance for their own best practices, institutions, and community members.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.3897/rio.8.e94042
A Multi-omics Data Analysis Workflow Packaged as a FAIR Digital Object
  • Aug 25, 2022
  • Research Ideas and Outcomes
  • Anna Niehues + 10 more

In current biomedical and complex trait research, increasing numbers of large molecular profiling (omics) data sets are being generated. At the same time, many studies fail to be reproduced (Baker 2016, Kim 2018). In order to improve study reproducibility and data reuse, including integration of data sets of different types and origins, it is imperative to work with omics data that is findable, accessible, interoperable, and reusable (FAIR, Wilkinson 2016) at the source. The data analysis, integration and stewardship pillar of the Netherlands X-omics Initiative aims to facilitate multi-omics research by providing tools to create, analyze and integrate FAIR omics data. We here report a joint activity of X-omics and the Netherlands Twin Register demonstrating the FAIRification of a multi-omics data set and the development of a FAIR multi-omics data analysis workflow. The implementation of FAIR principles (Wilkinson 2016) can improve scientific transparency and facilitate data reuse. However, Kim (2018) showed in a case study that the availability of data and code are required but not sufficient to reproduce data analyses. They highlighted the importance of interoperable and open formats, and structured metadata. In order to increase research reproducibility on the data analysis level, additional practices such as version-control, code licensing, and documentation have been proposed. These include recommendations for FAIR software by the Netherlands eScience Center and the Dutch Data Archiving and Networked Services (DANS), and FAIR principles for research software proposed by the Research Data Alliance (Chue Hong 2022). Data analysis in biomedical research usually comprises multiple steps often resulting in complex data analysis workflows and requiring additional practices, such as containerization, to ensure transparency and reproducibility (Goble 2020, Stoudt 2021). We apply these practices to a multi-omics data set that comprises genome-wide DNA methylation profiles, targeted metabolomics, and behavioral data of two cohorts that participated in the ACTION Biomarker Study (ACTION, Aggression in Children: Unraveling gene-environment interplay to inform Treatment and InterventiON strategies, see consortium members in Suppl. material 1) (Boomsma 2015, Bartels 2018, Hagenbeek 2020, van Dongen 2021, Hagenbeek 2022). The ACTION-NTR cohort consists of twins that are either longitudinally concordant or discordant for childhood aggression. The ACTION-Curium-LUMC cohort consists of children referred to the Dutch LUMC Curium academic center for child and youth psychiatry. With the joint analysis of multi-omics data and behavioral data, we aim to identify substructures in the ACTION-NTR cohort and link them to aggressive behavior. First, the individuals are clustered using Similarity Network Fusion (SNF, Wang 2014), and latent feature dimensions are uncovered using different unsupervised methods including Multi-Omics Factor Analysis (MOFA) (Argelaguet 2018) and Multiple Correspondence Analysis (MCA, Lê 2008, Husson 2017). In a second step, we determine correlations between -omics and phenotype dimensions, and use them to explain the subgroups of individuals from the ACTION-NTR cohort. In order to validate the results, we project data of the ACTION-Curium-LUMC cohort onto the latent dimensions and determine if correlations between omics and phenotype data can be reproduced. Integration of data across cohorts and across data types, requires interoperability. We applied different practices to make the data FAIR, including conversion of files to community-standard formats, and capturing experimental metadata using the ISA (Investigation, Study, Assay) metadata framework (Johnson 2021) and ontology-based annotations. All data analysis steps including pre-processing of different omics data types were implemented in either R or Python and combined in a modular Nextflow (Di Tommaso 2017) workflow, where the environment for each step is provided as a Singularity (Kurtzer 2017) container. The analysis workflow is packaged in a Research Object Crate (RO-Crate) (Soiland-Reyes 2022). The RO-Crate is a FAIR digital object that contains the Nextflow workflow including ontology-based annotations of each analysis step. Since omics data is considered to be potentially personally identifiable, the packaged workflow contains a minimal synthetic data set resembling the original data structure. Finally, the code is made available on GitHub and the workflow is registered at Workflowhub (Goble 2021). Since our Nextflow workflow is set up in a modular manner, the individual analysis steps can be reused in other workflows. We demonstrate this replicability by applying different sub-workflows to data from two different cohorts.

  • Research Article
  • Cite Count Icon 25
  • 10.1088/1361-6595/acb28c
Foundations of machine learning for low-temperature plasmas: methods and case studies
  • Feb 1, 2023
  • Plasma Sources Science and Technology
  • Angelo D Bonzanini + 4 more

Machine learning (ML) and artificial intelligence have proven to be an invaluable tool in tackling a vast array of scientific, engineering, and societal problems. The main drivers behind the recent proliferation of ML in practically all aspects of science and technology can be attributed to: (a) improved data acquisition and inexpensive data storage; (b) exponential growth in computing power; and (c) availability of open-source software and resources that have made the use of state-of-the-art ML algorithms widely accessible. The impact of ML on the field of low-temperature plasmas (LTPs) could be particularly significant in the emerging applications that involve plasma treatment of complex interfaces in areas ranging from the manufacture of microelectronics and processing of quantum materials, to the LTP-driven electrification of the chemical industry, and to medicine and biotechnology. This is primarily due to the complex and poorly-understood nature of the plasma-surface interactions in these applications that pose unique challenges to the modeling, diagnostics, and predictive control of LTPs. As the use of ML is becoming more prevalent, it is increasingly paramount for the LTP community to be able to critically analyze and assess the concepts and techniques behind data-driven approaches. To this end, the goal of this paper is to provide a tutorial overview of some of the widely-used ML methods that can be useful, amongst others, for discovering and correlating patterns in the data that may be otherwise impractical to decipher by human intuition alone, for learning multivariable nonlinear data-driven prediction models that are capable of describing the complex behavior of plasma interacting with interfaces, and for guiding the design of experiments to explore the parameter space of plasma-assisted processes in a systematic and resource-efficient manner. We illustrate the utility of various supervised, unsupervised and active learning methods using LTP datasets consisting of commonly-available, information-rich measurements (e.g. optical emission spectra, current–voltage characteristics, scanning electron microscope images, infrared surface temperature measurements, Fourier transform infrared spectra). All the ML demonstrations presented in this paper are carried out using open-source software; the datasets and codes are made publicly available. The FAIR guiding principles for scientific data management and stewardship can accelerate the adoption and development of ML in the LTP community.

  • Research Article
  • Cite Count Icon 1
  • 10.5339/qjph.2023.8
Transforming Primary Healthcare Services with Centralized Health Intelligence: A Case Study from Qatar
  • Feb 2, 2024
  • Qatar Journal of Public Health
  • Mujeeb C Kandy + 4 more

Centralized Health Intelligence (CHI) represents an ecosystem where data, technology, and expertise converge to elevate the standards of healthcare services. In this case study, we explore the pivotal role of CHI in reshaping the healthcare paradigm at the Primary Health Care Corporation (PHCC) in Qatar. Adopting clinical information systems led to a wealth of data, necessitating the transformation of the conventional health information management team into the Business and Health Intelligence (BHI) Department. The establishment of a centralized enterprise data warehouse (EDW) and the use of business intelligence tools by the BHI Department further contributed to improved data governance. A meticulous Centralized Health Intelligence Framework was devised to ensure the effective use of PHCC's data assets. This framework encompasses policies and procedures related to data stewardship, information needs assessment, data classification, data privacy, and data literacy requirements. Through this study, we demonstrate how PHCC has harnessed CHI to redefine healthcare delivery by providing relevant stakeholders with easy access to advanced data analytics and data-driven decision-making tools. Implementing CHI infrastructure presents many challenges, including financial, technical, and organizational hurdles. Despite these challenges, CHI significantly benefits relevant stakeholders by providing enhanced access to health data, fostering active engagement. However, addressing data comprehension, security, and privacy concerns remains critical.

  • Research Article
  • Cite Count Icon 27
  • 10.1017/dap.2020.14
Emerging health data platforms: From individual control to collective data governance
  • Jan 1, 2020
  • Data & Policy
  • Timothy Kariotis + 7 more

Health data have enormous potential to transform healthcare, health service design, research, and individual health management. However, health data collected by institutions tend to remain siloed within those institutions limiting access by other services, individuals or researchers. Further, health data generated outside health services (e.g., from wearable devices) may not be easily accessible or useable by individuals or connected to other parts of the health system. There are ongoing tensions between data protection and the use of data for the public good (e.g., research). Concurrently, there are a number of data platforms that provide ways to disrupt these traditional health data siloes, giving greater control to individuals and communities. Through four case studies, this paper explores platforms providing new ways for health data to be used for personal data sharing, self-health management, research, and clinical care. The case-studies include data platforms: PatientsLikeMe, Open Humans, Health Record Banks, and unforgettable.me. These are explored with regard to what they mean for data access, data control, and data governance. The case studies provide insight into a shift from institutional to individual data stewardship. Looking at emerging data governance models, such as data trusts and data commons, points to collective control over health data as an emerging approach to issues of data control. These shifts pose challenges as to how “traditional” health services make use of data collected on these platforms. Further, it raises broader policy questions regarding how to decide what public good data should be put towards.

  • Research Article
  • Cite Count Icon 51
  • 10.1016/j.jenvman.2013.08.046
Keeping it wild: Mapping wilderness character in the United States
  • Oct 30, 2013
  • Journal of Environmental Management
  • Steve Carver + 2 more

Keeping it wild: Mapping wilderness character in the United States

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon