Taking Digital Responsibility for Data: Toward a Governance Model for User-Generated Data
Abstract As firms increasingly rely on user-generated data (UGD) to drive decision-making, questions of digital responsibility have become central to contemporary data governance. This article develops a governance model for UGD that integrates the ecosystem view with three complementary dimensions—object, authority, and subject—to clarify what is governed, who holds governing power, and who is affected. This model captures the distributed and relational nature of governance across interconnected actors and infrastructures. While the model is designed to be broadly applicable, we illustrate how governance responsibilities shift between different phases in the lifecycle of UGD, such as its initial capture and subsequent use. By linking these lifecycle moments to the roles and relationships within the ecosystem, we highlight how technical, organizational, and stakeholder considerations converge in practice. The model emphasizes transparency, contextual integrity, and distributed accountability as guiding principles for responsible data management, offering actionable guidance for managers and policymakers navigating the tension between data-driven innovation and digital responsibility.
- Research Article
165
- 10.1016/j.ijinfomgt.2021.102331
- Feb 19, 2021
- International Journal of Information Management
From user-generated data to data-driven innovation: A research agenda to understand user privacy in digital markets
- Conference Article
9
- 10.1109/bigdatase50710.2020.00014
- Dec 1, 2020
Utilisation of transfer learning with deep language models is regarded as one of the most important developments in deep learning. Their application on real-time high-velocity and volume user-generated data has been elusive due to the unprecedented size and complexity of the models which result in substantial computational overhead. Recent iterations of these architectures have produced significantly distilled models with state-of-the-art performance and reduced resource requirement. We utilize deep transformer language models on user-generated data alongside a robust text normalization pipeline to address what is considered as the Achilles heel of deep learning on user-generated text data, namely data normalization.In this paper, we propose a framework for the ingestion, analysis and storage of real-time data streams. A case study in sentiment analysis and offensive/hateful language detection is used to evaluate the framework. We demonstrate inference on a large Twitter dataset using CPU and GPU clusters, highlighting the viability of the fine-tuned distilled language model for high volume data. Fine-tuned model significantly outperforms previous state-of-the-art on several benchmark datasets, providing a powerful model that can be utilized for a variety of downstream tasks. To our knowledge, this is the only study demonstrating powerful transformer language models for realtime social media stream analytics in a distributed setting.
- Research Article
12
- 10.4230/dagman.7.1.1
- Jan 1, 2018
The area of Principles of Data Management (PDM) has made crucial contributions to the development of formal frameworks for understanding and managing data and knowledge. This work has involved a rich cross-fertilization between PDM and other disciplines in mathematics and computer science, including logic, complexity theory, and knowledge representation. We anticipate on-going expansion of PDM research as the technology and applications involving data management continue to grow and evolve. In particular, the lifecycle of Big Data Analytics raises a wealth of challenge areas that PDM can help with. In this report we identify some of the most important research directions where the PDM community has the potential to make significant contributions. This is done from three perspectives: potential practical relevance, results already obtained, and research questions that appear surmountable in the short and medium term.
- Research Article
16
- 10.1016/j.ijinfomgt.2022.102595
- Nov 14, 2022
- International Journal of Information Management
Data-driven innovation has received increasing attention, which explores big data technologies to gain more insights and advantages for product design. In user experience (UX) based design innovation, user-generated data and archived design documents are two valuable resources for various design activities such as identifying opportunities and generating design ideas. However, these two resources are usually isolated in different systems. Additionally, design information typically represented based on functional aspects is limited for UX-oriented design. To facilitate experience-oriented design activities, we propose a twin data-driven approach to integrate UX data and archived design documents. In particular, we aim to extract UX concepts from product reviews and design concepts from patents respectively and to discover associations between the extracted concepts. First, a UX-integrated design information representation model is proposed to associate capabilities with key elements of UX at the concept, category, and aspect levels of information. Based on this model, a twin data-driven approach is developed to bridge experience information and design information. It contains three steps: experience aspect identification using an attention-based LSTM (Long short-term memory) network, design information categorization based on topic clustering using BERT (Bidirectional Encoder Representations from Transformers) and LAD (Latent Dirichlet allocation) model, and experience needs and design information integration by leveraging word embedding techniques to measure concept similarity. A case study using healthcare-related experience and design information has demonstrated the feasibility and effectiveness of this approach.
- Research Article
9
- 10.1108/pr-07-2022-0491
- Jun 5, 2023
- Personnel Review
PurposeOrganizations that prioritize humanistic responsibility create an environment of value for their employees as the most important stakeholders. However, despite the numerous corporate social responsibility (CSR) models and research highlighting stakeholder considerations, the long-standing “social” aspect of CSR has inhibited its humanism responsibility. In response, this study proposes to move beyond the antecedents and outcomes of CSR to explore how perceived CSR can promote its humanistic responsibility both inside and outside of organizations.Design/methodology/approachThe authors followed Sendjaya et al. (2008) ’s methodology for developing and validating the perceived corporate humanistic responsibility (CHR) scale. Study 1 validated the CHR's content. Study 2 established the measure’ reliability, internal consistency, unidimensionality and discriminant validity. The authors describe each of the studies in the forthcoming sections.FindingsThis research has produced a comprehensive set of perceived CHR items for business leaders based on earlier CHR/humanism concepts. Through the deconstruction of CHR theory, the granular conceptualization provides employee-centric workplaces, healthy internal communication, holistic compensation, CSR-committed behaviors and holistic training and development, equipped to assess how their CHR fosters humanistic workplaces that encourage socially responsible behaviors. This, in turn, would have an immense impact on employee well-being that, in turn, flourishes societal well-being.Research limitations/implicationsAlthough the perceived CHR scale's psychometric properties were confirmed using multiple tests ranging from qualitative to quantitative studies, this newly developed scale requires further investigation to explore whether internal or external relevance factors affect organizations' humanistic responsibility.Practical implicationsCSR is about caring for humans and the planet. The authors have unpacked what and how the human side of CSR operates for business leaders to advance their CHR practices and responsible management learning. The perceived CHR dimensions can guide business leaders to promote multidimensional humanistic behaviors inside and outside workplaces that transcend how to strengthen the humanistic responsibility behaviors of corporations to promote CHR by articulating how the “Social” aspect of CSR ought to function for employee well-being first.Social implicationsThis study responds to Sustainable Development Goals (SDGs) most aligned with the SDG 3 (good health and well-being) and SDG 8 (decent work and economic growth) by promoting humanistic workplaces with implications for United Nation's Principles for Responsible Management that encourages universities to educate students on humanism concepts in business management.Originality/valueThe originality lies in the empirical study of CHR. By incorporating the original concepts of humanism/humanistic management and CHR, the authors empirically articulate how CHR may be practically implemented as an elaborated humanistic synthesis for corporations.
- Research Article
5
- 10.1007/978-1-61779-842-9_12
- Jan 1, 2012
- Methods in molecular biology (Clifton, N.J.)
In this chapter, we outline some basic principles for the consistent management of immunogenetic data. These include the preparation of a single master data file that can serve as the basis for all subsequent analyses, a focus on the quality and homogeneity of the data to be analyzed, the documentation of the coding systems used to represent the data, and the application of nomenclature standards specific for each immunogenetic system being evaluated. The data management principles discussed here are intended to provide a foundation for the data analysis methods detailed in Chaps. 13 and 14 . The relationship between the data management and analysis methods covered in these three chapters is illustrated in Fig. 3.The application of these data management principles is a first step toward consistent and reproducible data analyses. While it may take extra time and effort to apply them, we feel that it is better to take this approach than to assume that low data quality can be compensated for by large sample sizes.In addition to their relevance for analytical reproducibility, it is important to consider these data management principles from an ethical perspective. The reliability of the data collected and generated as part of a research study should be as important a component of the ethical review of a research application as the security of those data. Finally, in addition to ensuring the integrity of the data from collection to publication, the application of these data management principles will provide a means to foster research integrity and to improve the potential for collaborative data sharing.
- Front Matter
3
- 10.1074/mcp.e120.001985
- Apr 1, 2020
- Molecular & Cellular Proteomics
The Data Must Be Accessible to All
- Research Article
1
- 10.3897/biss.3.37190
- Jun 19, 2019
- Biodiversity Information Science and Standards
Major environmental–biodiversity changes and new developments in technology have changed the way we live, work and how we create our future. The main attention of biodiversity researchers nowadays is focused on the application of the developments of digital technology for fast, efficient and ethical solutions related to biodiversity data management and stewardship, so that everyone can benefit from better data, better science and better policies. Here the originally developed five point biodiversity data research and management programme is presented for your kind attention. Open Biodiversity Data (OBD) which are usable, useful and used to promote the knowledge and the preservation of biodiversity. Linking Open Biodiversity Data and making it useful for everybody. Knowledge of biodiversity for many and not for few. The global web is our work platform. OBD creates an open community of contributors and users. Open is FAIR. FAIR Guiding Principles for biodiversity data management and stewardship (EU Commission 2018) Circular Open Biodiversity Data a high-value foundation for artificial intelligence. Open Biodiversity Data (OBD) which are usable, useful and used to promote the knowledge and the preservation of biodiversity. Linking Open Biodiversity Data and making it useful for everybody. Knowledge of biodiversity for many and not for few. The global web is our work platform. OBD creates an open community of contributors and users. Open is FAIR. FAIR Guiding Principles for biodiversity data management and stewardship (EU Commission 2018) Circular Open Biodiversity Data a high-value foundation for artificial intelligence. It is of great importance for society, communities and on a global scale to have unbiased and direct access to Open Biodiversity Data, as this maximises the impact on decision making and development of efficient solutions to tackle environmental problems (De Prins 2016, De Prins 2017, De Prins and De Prins 2019a, De Prins and De Prins 2019b, De Prins and Serna 2018, European Commission 2018). While embracing the latest developments of technology, in particular high resolution visual recognition tools, computer vision technologies to work with huge image galleries and link with other fields of biodiversity data (genetic sequences, distribution data) we are going to create an open international community of knowledgeable users, responsible for OBD. I suggest implementing FAIR guiding principles: Findability, Accessibility, Interoperability and Reusability. Following FAIR principles OBD become accessible to everybody, they are findable, visual, flexible, interchangeable in different formats, updated and reused for all biodiversity related questions. This kind of approach will ensure the use, and reuse of data in the most efficient way, so biodiversity knowledge and information are based on verified and clean data.
- Research Article
9
- 10.1026/0033-3042/a000514
- Apr 1, 2021
- Psychologische Rundschau
Providing to research data collected as part of publications and publicly funded research projects is now regarded as a central aspect of an open and transparent practice and is increasingly being called for by funding institutions and journals. To this end, researchers should strive to comply with the so-called FAIR principles (of data management), that is, research data should be findable, accessible, interoperable, and reusable. Systematic data management supports these goals and, at the same time, makes it possible to achieve them efficiently. With these revised recommendations on data management and data sharing, which also draw on feedback from a 2018 survey of its members, the German Psychological Society (Deutsche Gesellschaft fur Psychologie; DGPs) specifies important basic principles of data management in psychology. Initially, based on discipline-specific definitions of raw data, primary data, secondary data, and metadata, we provide recommendations on the degree of data processing necessary when publishing data. We then discuss data protection as well as aspects of copyright and data usage before defining the qualitative requirements for trustworthy research data repositories. This is followed by a detailed discussion of pragmatic aspects of data sharing, such as the differences between Type 1 and Type 2 data publications, restrictions on (embargo period), the definition of scientific use by secondary users of shared data, and recommendations on how to resolve potential disputes. Particularly noteworthy is the new recommendation of distinct access categories for data, each with different requirements in terms of data protection or research ethics. These range from completely open data without usage restrictions (access category 0) to data shared under a set of standardized conditions (e.g., reuse restricted to purposes; access category 1), individualized usage agreements (access category 2), and secure data under strictly controlled conditions (e.g., in a research data center; “access category 3). The practical implementation of this important innovation, however, will require data repositories to provide the necessary technical functionalities. In summary, the revised recommendations aim to present pragmatic guidelines for researchers to handle psychological research data in an open and transparent manner, while addressing structural challenges to data sharing solutions that are beneficial for all involved parties.
- Research Article
2
- 10.22032/dbt.37811
- Jan 1, 2018
The need to fulfil FAIR guiding principles for data management and publication [1] directly affects researchers, i.e., data producers as well as data managers. Data management has to be set up well already at an early stage of the data life cycle. This is demonstrated by a best practice work- and dataflow 'Fungal community barcoding data', which has been established as side product in the context of the project 'GBOL 2 Mycology‘, German Barcode of Life initiative (https://www.bolgermany.de/). The work- and dataflow was set up by applying the newly published MOD-CO schema, Version 1.0 which has been implemented as an instance of the database application DiversityDescriptions for data management, and for making data compliant to GFBio infrastructure for data archiving and publication. \nThe comprehensive conceptual schema MOD-CO for 'Meta-Omics Data of Collection Objects' Version 1.0 was published as Linked Open Data representation in spring 2018 [2]. The process-oriented schema describes operations and object properties along the work- and dataflow from gathering environmental samples, to the various transformation, transaction, and measurement steps in the laboratory up to sample and data publication and archiving. By supporting various kinds of relationships, the MOD-CO schema allows for the concatenation of individual records of the operational steps along a workflow. The MOD-CO descriptor structure in version 1.0 comprises 653 descriptors (concepts) and 1,810 predefined descriptor states, organised in 37 concept collections. The published version 1.0 is available as various schema representations of identical content (https://www.mod-co.net/wiki/Schema_Representations). \nThis schema has been implemented as data structure in the relational database DiversityDescriptions (DWB-DD) (https://diversityworkbench.net/Portal/DiversityDescriptions), a generic component of the Diversity Workbench environment (https://diversityworkbench.net). DWB-DD is considered being appropriate to be applied as a LIMS (Laboratory Information Management System) and ELN (Electronic Laboratory Notebook) for organising Fungal community barcoding data' and similar data collections in molecular laboratories. Its data export interface provides guidance to generate data and metadata in the formats CSV and XML, the latter following the SDD metadata schema with involvement of extensions by metadata elements from EML and ABCD standards; for community standards see: https://gfbio.biowikifarm.net/wiki/Data_exchange_standards,_protocols_and_formats_relevant_for_the_collection_data_domain_within_the_GFBio_network. The research data themselves are organised according to the MOD-CO data schema. \nThe data package of the work- and dataflow 'Fungal community barcoding data' is going to be submitted to GFBio after having been checked for GFBio compliance and to be published under a creative common license. Suggestions for standardized citation will be provided, a DOI assigned, and long-term data archiving ensured. \n \nKEYWORDS: DiversityDescriptions, German Barcode of Life (GBOL), German Federation for Biological Data (GFBio), MOD-CO conceptual schema, use case for community barcoding data \n \nREFERENCES: \n1. Wilkinson, M.D. et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. – Sci. Data 3: 160018. DOI: 10.1038/sdata.2016.18. \n2. Rambold, G., Yilmaz, P., Harjes, J., Link, A., Glöckner, F.O., Triebel, D. 2018. MOD-CO schema – a conceptual schema for processing sample data in meta’omics research (version 1.0). http://mod-co.net/wiki/MOD-CO_Schema_Reference.
- Research Article
- 10.18844/gjit.v10i2.4764
- Oct 30, 2020
- Global Journal of Information Technology: Emerging Technologies
The digital revolution and the communication platforms provided by the Web 2.0 virtual space era, such as social media, social networks, other tools and channels, create new opportunities for better marketing decisions based on user-generated data analysis. Every day customers of social media and other virtual tools are creating huge amounts of their actions-caused data, and businesses lack management tools supporting this process, which could create knowledge in the areas of deeper cognitive customer profiles and preferences. The growing number of social media users indicates the popularity of these communication tools among the information society, but science today lacks a deeper knowledge of social media-generated data and other algorithms for this kind of data usage. Therefore, the purpose of the article can be defined as the development of a conceptual model of big data generated by social media usage in business. The formation of the conceptual model is based on the analysis of big data assumptions and application possibilities, social media classification peculiarities and different channel specifics, identification of big data analysis methods and analysis of big data applications generated by social media. The conceptual model creates preconditions for deeper knowledge of user-generated big data in nowadays’ widely used communication platforms, as well as creation of the decision support tool for marketing specialists in order to use big data from social media in deeper cognitive customer profiles and preferences. The methods employed in this research are literature and other references analysis, synthesis and logical analysis of information, comparison of information, systemisation and visualisation. Keywords: Big data, data mining, social media, social networks, internet marketing.
- Research Article
- 10.4028/www.scientific.net/amm.713-715.2418
- Jan 1, 2015
- Applied Mechanics and Materials
Data management has experienced three stages: labor management, file systems, and database systems. In this paper, manage equipment data using a combination of HDFS file system and HBase database: the principles of HBase data management is studied; equipment data’s reading and writing processes is established; data model of equipment database is designed based on HBase.
- Research Article
- 10.3390/su17198749
- Sep 29, 2025
- Sustainability
While integrated Artificial Intelligence and Business Analytics (AI-BA) represents a significant advancement in marketing analytics and greatly influences firms’ innovations, there is a considerable gap in current research regarding its impact on technological innovation. This study addresses this gap by exploring how AI-BA affects data-driven and technological innovation, considering the mediating roles of integration capabilities and digital platforms. A theoretical model has been developed based on the dynamic capability view (DCV) and organizational information processing theory (OIPT). The model has been validated using data from enterprises in Saudi Arabia, and Partial Least Squares Structural Equation Modeling (PLS-SEM) has been employed for analysis. The findings demonstrate that AI-BA directly enhances both technological and data-driven innovation. Additionally, it was discovered that data-driven innovation, integration capabilities, and digital platforms mediate these effects, thereby enhancing technological innovation within the respective industries. These findings provide both theoretical and practical insights into the relationship between AI-BA, data-driven innovation, and technological innovation. They enrich the existing literature and provide actionable guidance for practitioners aiming to align their AI-BA with improved technological innovation outcomes.
- Book Chapter
2
- 10.1007/978-3-031-16802-4_41
- Jan 1, 2022
Facing rapidly growing volumes of research datasets, scientists and research funding agencies are putting forward new principles of data management, such as data Findability, Accessibility, Interoperability, and Reusability (FAIR). To this end, data science experts are developing FAIR data policies, methods, protocols, and repositories, while actual research practices are lagging behind because FAIR compliance remains a burden for many researchers. Here we present a prototype data management infrastructure deployed at the National High Magnetic Field Laboratory (NHMFL/MagLab) aimed at helping scientists efficiently annotate and manage experimental data produced by their MagLab projects and making FAIR practices accessible and attractive. The infrastructure incorporates the Open Science Framework (OSF) data repository platform. We will describe infrastructure elements such as the data formats, the metadata schema, the repository integration, the naming conventions, the templates to organize the data, and the automated data pipeline from measurement stations to the FAIR repository objects.KeywordsFAIR dataOpen scienceRepositoryDataset publishing
- Research Article
1
- 10.1093/database/baaf021
- Apr 4, 2025
- Database : the journal of biological databases and curation
Rice is a vital staple crop, sustaining over half of the global population, and is a key model for genetic research. To support the growing need for comprehensive and accessible rice genomic data, GrameneOryza (https://oryza.gramene.org) was developed as an online resource adhering to FAIR (Findable, Accessible, Interoperable, and Reusable) principles of data management. It distinguishes itself through its comprehensive multispecies focus, encompassing a wide variety of Oryza genomes and related species, and its integration with FAIR principles to ensure data accessibility and usability. It offers a community curated selection of high-quality Oryza genomes, genetic variation, gene function, and trait data. The latest release, version 8, includes 28 Oryza genomes, covering wild rice and domesticated cultivars. These genomes, along with Leersia perrieri and seven additional outgroup species, form the basis for 38 K protein-coding gene family trees, essential for identifying orthologs, paralogs, and developing pan-gene sets. GrameneOryza's genetic variation data features 66 million single-nucleotide variants (SNVs) anchored to the Os-Nipponbare-Reference-IRGSP-1.0 genome, derived from various studies, including the Rice Genome 3 K (RG3K) project. The RG3K sequence reads were also mapped to seven additional platinum-quality Asian rice genomes, resulting in 19 million SNVs for each genome, significantly expanding the coverage of genetic variation beyond the Nipponbare reference. Of the 66 million SNVs on IRGSP-1.0, 27 million acquired standardized reference SNP cluster identifiers (rsIDs) from the European Variation Archive release v5. Additionally, 1200 distinct phenotypes provide a comprehensive overview of quantitative trait loci (QTL) features. The newly introduced Oryza CLIMtools portal offers insights into environmental impacts on genome adaptation. The platform's integrated search interface, along with a BLAST server and curation tools, facilitates user access to genomic, phylogenetic, gene function, and QTL data, supporting broad research applications. Database URL: https://oryza.gramene.org.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.