Quantitative Data Valuation Methods: A Systematic Review and Taxonomy

  • Abstract
  • References
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Background: Data valuation is an area of study encompassing but not limited to data quality, machine learning, applied energy, and information economics. The primary focus of data valuation research is the development of methodologies for determining the value (relevance or worth) of data. Quantitative data valuation methods are techniques that assign a numerical value to data. Method: The methodology used for this systematic review complies with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. A search string was run on five databases: IEEE, ACM, Web of Science, Science Direct, and AISeL. Studies introducing at least one new quantitative data valuation method and published before the 29th of February 2024 were extracted and analysed. Results: In total 75 studies have been included in this review paper. Quantitative data valuation research is mainly driven by the Machine Learning (including federated learning and supervised learning) and Data Management communities, which account for 48% (36 studies) and 20% (15 studies), respectively, of the included studies. In the last 3 years (2021–2023), the number of publications has at least increased by a factor of 1.5 every year, mainly driven by the Machine Learning community. Discussion: Under the current data deluge, quantitative data valuation has become crucial for effective data management in organisations, for example to enable prioritisation of data management activities. In parallel, the rapid rise in interest in explainable and federated machine learning depends on quantitative data valuation methods to identify the most relevant data to explain model outcomes and the most suitable federated clients to peer with. These two trends ensure that quantitative data valuation publications will continue to increase for many years to come. The data valuation methods favoured by the data management and machine learning communities overlap but also diverge so it is important to include both perspectives to have a comprehensive view of data value, data valuation, and their relationship to data quality. Conclusion: According to the studies included in this review, quantitative data valuation methods have been around for at least 25 years (2000–2024). However, such methods have received an increasing interest in the last 6 years mainly driven by the Machine Learning community exploring automated methods to select or focus attention on highly valuable (i.e., relevant) data for a model or the explanation of a model’s outputs. Quantitative data valuation has many application areas including but not limited to data management, military operations, system security, and economics. Other: The multidisciplinary nature of data valuation as a research topic has resulted in a lack of consensus on a definition of the term “data value”, with many domain-specific definitions emerging. After analysing the data value and data valuation definitions found in our literature search, we propose here a cross-domain definition of the term data value. We have also proposed a taxonomy of quantitative data valuation methods based on input types (data, metadata, bimodal) and application context to show the variety of techniques used in this field.

ReferencesShowing 10 of 92 papers
  • Open Access Icon
  • Cite Count Icon 5
  • 10.1145/3461702.3462574
Who's Responsible? Jointly Quantifying the Contribution of the Learning Algorithm and Data
  • Jul 21, 2021
  • Gal Yona + 2 more

  • Cite Count Icon 7
  • 10.1080/13658816.2015.1124434
Task-oriented information value measurement based on space-time prisms
  • Dec 17, 2015
  • International Journal of Geographical Information Science
  • Yingjie Hu + 2 more

  • Cite Count Icon 2
  • 10.1109/bigcom.2019.00016
MDV: A Multi-Factors Data Valuation Method
  • Aug 1, 2019
  • Xiao Ma + 1 more

  • Open Access Icon
  • PDF Download Icon
  • Cite Count Icon 21
  • 10.1162/99608f92.c18db966
A Review of Data Valuation Approaches and Building and Scoring a Data Valuation Model
  • Jan 26, 2023
  • Harvard Data Science Review
  • Mike Fleckenstein + 2 more

  • Open Access Icon
  • Cite Count Icon 5
  • 10.1007/978-3-031-18523-6_12
Towards More Efficient Data Valuation in Healthcare Federated Learning using Ensembling.
  • Jan 1, 2022
  • Distributed, collaborative, and federated learning, and affordable AI and healthcare for resource diverse global health : Third MICCAI Workshop, DeCaF 2022 and Second MICCAI Workshop, FAIR 2022, held in conjunction with MICCAI 2022, Sin...
  • Sourav Kumar + 7 more

  • Open Access Icon
  • Cite Count Icon 92
  • 10.1007/978-3-030-63076-8_11
A Principled Approach to Data Valuation for Federated Learning
  • Jan 1, 2020
  • Tianhao Wang + 4 more

  • Cite Count Icon 21
  • 10.1007/978-3-319-07040-7_9
Digital Information Asset Evaluation: Characteristics and Dimensions
  • Jan 1, 2014
  • Gianluigi Viscusi + 1 more

  • Cite Count Icon 1
  • 10.1109/ainaw.2007.296
Proactive Documents--A New Paradigm to Access Document Information from Various Contexts
  • Jan 1, 2007
  • Karlheinz E.S Toni

  • Cite Count Icon 16
  • 10.2991/eusflat.2011.91
Wisdom - the blurry top of human cognition in the DIKW-model?
  • Jan 1, 2011
  • Anett Hoppe + 3 more

  • Open Access Icon
  • Cite Count Icon 4
  • 10.1145/3672082
Use of Context in Data Quality Management: A Systematic Literature Review
  • Sep 30, 2024
  • Journal of Data and Information Quality
  • Flavia Serra + 3 more

Similar Papers
  • Conference Article
  • 10.4230/lipics.icdt.2021.2
Comparing Apples and Oranges: Fairness and Diversity in Ranking (Invited Talk).
  • Mar 17, 2021
  • Julia Stoyanovich

Algorithmic rankers take a collection of candidates as input and produce a ranking (permutation) of the candidates as output. The simplest kind of ranker is score-based; it computes a score of each candidate independently and returns the candidates in score order. Another common kind of ranker is learning-to-rank, where supervised learning is used to predict the ranking of unseen candidates. For both kinds of rankers, we may output the entire permutation or only the highest scoring k candidates, the top-k. Set selection is a special case of ranking that ignores the relative order among the top-k. In the past few years, there has been much work on incorporating fairness and diversity requirements into algorithmic rankers, with contributions coming from the data management, algorithms, information retrieval, and recommender systems communities. In my talk I will offer a broad perspective that connects formalizations and algorithmic approaches across subfields, grounding them in a common narrative around the value frameworks that motivate specific fairness- and diversity-enhancing interventions. I will discuss some recent and ongoing work, and will outline future research directions where the data management community is well-positioned to make lasting impact, especially if we attack these problems with our rich theory-meets-systems toolkit.

  • Conference Article
  • Cite Count Icon 123
  • 10.1145/3035918.3054775
Data Management in Machine Learning
  • May 9, 2017
  • Arun Kumar + 2 more

Large-scale data analytics using statistical machine learning (ML), popularly called advanced analytics, underpins many modern data-driven applications. The data management community has been working for over a decade on tackling data management-related challenges that arise in ML workloads, and has built several systems for advanced analytics. This tutorial provides a comprehensive review of such systems and analyzes key data management challenges and techniques. We focus on three complementary lines of work: (1) integrating ML algorithms and languages with existing data systems such as RDBMSs, (2) adapting data management-inspired techniques such as query optimization, partitioning, and compression to new systems that target ML workloads, and (3) combining data management and ML ideas to build systems that improve ML lifecycle-related tasks. Finally, we identify key open data management challenges for future research in this important area.

  • Research Article
  • Cite Count Icon 2
  • 10.2218/ijdc.v16i1.768
FAIR Forever? Accountabilities and Responsibilities in the Preservation of Research Data
  • Sep 30, 2021
  • International Journal of Digital Curation
  • Amy Currie + 1 more

Digital preservation is a fast-moving and growing community of practice of ubiquitous relevance, but in which capability is unevenly distributed. Within the open science and research data communities, digital preservation has a close alignment to the FAIR principles and is delivered through a complex specialist infrastructure comprising technology, staff and policy. However, capacity erodes quickly, establishing a need for ongoing examination and review to ensure that skills, technology, and policy remain fit for changing purpose. To address this challenge, the Digital Preservation Coalition (DPC) conducted the FAIR Forever study, commissioned by the European Open Science Cloud (EOSC) Sustainability Working Group and funded by the EOSC Secretariat Project in 2020, to assess the current strengths, weaknesses, opportunities and threats to the preservation of research data across EOSC, and the feasibility of establishing shared approaches, workflows and services that would benefit EOSC stakeholders.
 This paper draws from the FAIR Forever study to document and explore its key findings on the identified strengths, weaknesses, opportunities, and threats to the preservation of FAIR data in EOSC, and to the preservation of research data more broadly. It begins with background of the study and an overview of the methodology employed, which involved a desk-based assessment of the emerging EOSC vision, interviews with representatives of EOSC stakeholders, and focus groups with digital preservation specialists and data managers in research organizations. It summarizes key findings on the need for clarity on digital preservation in the EOSC vision and for elucidation of roles, responsibilities, and accountabilities to mitigate risks of data loss, reputation, and sustainability. It then outlines the recommendations provided in the final report presented to the EOSC Sustainability Working Group.
 To better ensure that research data can be FAIRer for longer, the recommendations of the study are presented with discussion on how they can be extended and applied to various research data stakeholders in and outside of EOSC, and suggest ways to bring together research data curation, management, and preservation communities to better ensure FAIRness now and in the long term.

  • Research Article
  • Cite Count Icon 256
  • 10.1007/s00778-022-00775-9
Data collection and quality challenges in deep learning: a data-centric AI perspective
  • Jan 3, 2023
  • The VLDB Journal
  • Steven Euijong Whang + 3 more

Data-centric AI is at the center of a fundamental shift in software engineering where machine learning becomes the new software, powered by big data and computing infrastructure. Here, software engineering needs to be re-thought where data become a first-class citizen on par with code. One striking observation is that a significant portion of the machine learning process is spent on data preparation. Without good data, even the best machine learning algorithms cannot perform well. As a result, data-centric AI practices are now becoming mainstream. Unfortunately, many datasets in the real world are small, dirty, biased, and even poisoned. In this survey, we study the research landscape for data collection and data quality primarily for deep learning applications. Data collection is important because there is lesser need for feature engineering for recent deep learning approaches, but instead more need for large amounts of data. For data quality, we study data validation, cleaning, and integration techniques. Even if the data cannot be fully cleaned, we can still cope with imperfect data during model training using robust model training techniques. In addition, while bias and fairness have been less studied in traditional data management research, these issues become essential topics in modern machine learning applications. We thus study fairness measures and unfairness mitigation techniques that can be applied before, during, or after model training. We believe that the data management community is well poised to solve these problems.

  • Research Article
  • Cite Count Icon 170
  • 10.1145/3299887.3299891
Data Lifecycle Challenges in Production Machine Learning
  • Dec 11, 2018
  • ACM SIGMOD Record
  • Neoklis Polyzotis + 3 more

Machine learning has become an essential tool for gleaning knowledge from data and tackling a diverse set of computationally hard tasks. However, the accuracy of a machine learned model is deeply tied to the data that it is trained on. Designing and building robust processes and tools that make it easier to analyze, validate, and transform data that is fed into large-scale machine learning systems poses data management challenges. Drawn from our experience in developing data-centric infrastructure for a production machine learning platform at Google, we summarize some of the interesting research challenges that we encountered, and survey some of the relevant literature from the data management and machine learning communities. Specifically, we explore challenges in three main areas of focus - data understanding, data validation and cleaning, and data preparation. In each of these areas, we try to explore how different constraints are imposed on the solutions depending on where in the lifecycle of a model the problems are encountered and who encounters them.

  • Research Article
  • Cite Count Icon 5
  • 10.14778/3611540.3611573
Data and AI Model Markets: Opportunities for Data and Model Sharing, Discovery, and Integration
  • Aug 1, 2023
  • Proceedings of the VLDB Endowment
  • Jian Pei + 2 more

The markets for data and AI models are rapidly emerging and increasingly significant in the realm and the practices of data science and artificial intelligence. These markets are being studied from diverse perspectives, such as e-commerce, economics, machine learning, and data management. In light of these developments, there is a pressing need to present a comprehensive and forward-looking survey on the subject to the database and data management community. In this tutorial, we aim to provide a comprehensive and interdisciplinary introduction to data and AI model markets. Unlike a few recent surveys and tutorials that concentrate only on the economics aspect, we take a novel perspective and examine data and AI model markets as grand opportunities to address the long-standing problem of data and model sharing, discovery, and integration. We motivate the importance of data and model markets using practical examples, present the current industry landscape of such markets, and explore the modules and options of such markets from multiple dimensions, including assets in the markets (e.g., data versus models), platforms, and participants. Furthermore, we summarize the latest advancements and examine the future directions of data and AI model markets as mechanisms for enabling and facilitating sharing, discovery, and integration.

  • Conference Article
  • 10.1145/3448016.3460535
Utilizing (and Designing) Modern Hardware for Data-Intensive Computations
  • Jun 9, 2021
  • Kenneth A Ross

Modern information-intensive systems, including data management systems, operate on data that is mostly resident in RAM. As a result, the data management community has shifted focus from I/O optimization to addressing performance issues higher in the memory hierarchy. In this keynote, I will give a personal perspective of these developments, illustrated by work from my group at Columbia University. I will use the concept of abstraction as a lens through which various kinds of optimizations for modern hardware platforms can be understood and evaluated. Through this lens, some cute implementation tricks can be seen as much more than mere implementation details. I will discuss abstractions at various granularities, from single lines of code to whole programming/query languages. I will touch on software and hardware design for data-intensive computations. I will also discuss data processing in a conventional programming language, and how the data management community might contribute to the design of compilers.

  • Front Matter
  • Cite Count Icon 1
  • 10.1016/j.patter.2020.100100
Artisanal and Industrial: The Different Methods of Data Creation
  • Sep 1, 2020
  • Patterns
  • Sarah Callaghan

Artisanal and Industrial: The Different Methods of Data Creation

  • Conference Article
  • Cite Count Icon 1
  • 10.1145/3462462.3468878
Towards understanding end-to-end learning in the context of data
  • Jun 20, 2021
  • Wentao Wu + 1 more

Recent advances in machine learning (ML) systems have made it incredibly easier to train ML models given a training set. However, our understanding of the behavior of the model training process has not been improving at the same pace. Consequently, a number of key questions remain: How can we systematically assign importance or value to training data with respect to the utility of the trained models, may it be accuracy, fairness, or robustness? How does noise in the training data, either injected by noisy data acquisition processes or adversarial parties, have an impact on the trained models? How can we find the right data that can be cleaned and labeled to improve the utility of the trained models? Just when we start to understand these important questions for ML models in isolation recently, we now have to face the reality that most real-world ML applications are way more complex than a single ML model. In this article---an extended abstract for an invited talk at the DEEM workshop---we will discuss our current efforts in revisiting these questions for an end-to-end ML pipeline, which consists of a noise model for data and a feature extraction pipeline, followed by the training of an ML model. In our opinion, this poses a unique challenge on the joint analysis of data processing and learning. Although we will describe some of our recent results towards understanding this interesting problem, this article is more of a confession on our technical struggles and a cry for help to our data management community.

  • Research Article
  • Cite Count Icon 94
  • 10.14778/3415478.3415562
Data collection and quality challenges for deep learning
  • Aug 1, 2020
  • Proceedings of the VLDB Endowment
  • Steven Euijong Whang + 1 more

Software 2.0 refers to the fundamental shift in software engineering where using machine learning becomes the new norm in software with the availability of big data and computing infrastructure. As a result, many software engineering practices need to be rethought from scratch where data becomes a first-class citizen, on par with code. It is well known that 80--90% of the time for machine learning development is spent on data preparation. Also, even the best machine learning algorithms cannot perform well without good data or at least handling biased and dirty data during model training. In this tutorial, we focus on data collection and quality challenges that frequently occur in deep learning applications. Compared to traditional machine learning, there is less need for feature engineering, but more need for significant amounts of data. We thus go through state-of-the-art data collection techniques for machine learning. Then, we cover data validation and cleaning techniques for improving data quality. Even if the data is still problematic, hope is not lost, and we cover fair and robust training techniques for handling data bias and errors. We believe that the data management community is well poised to lead the research in these directions. The presenters have extensive experience in developing machine learning platforms and publishing papers in top-tier database, data mining, and machine learning venues.

  • Conference Instance
  • Cite Count Icon 7
  • 10.1145/3448016
Proceedings of the 2021 International Conference on Management of Data
  • Jun 9, 2021

This year, due to the global uncertainties, travel restrictions, and other potential difficulties associated with Covid-19, SIGMOD is being held entirely online, instead of at its originally planned location of Xi'an, Shaanxi, China. The online SIGMOD conference is complemented with a local physical event at Xi'an, primarily targeting the data management community in China. Despite the challenging times that we find ourselves in, we have an exciting technical program with outstanding research, industrial and demonstration track presentations, keynotes, tutorials, panels, and the awards session. For the first time, SIGMOD is being held round the clock, with each technical talk presented twice 12 hours apart, to better accommodate online participants from around the world, Also, for the first time, presentations are being grouped into curated sessions to give participants a cohesive, single track experience on a variety of leading edge topics in data management. We are using the latest technologies to keep SIGMOD vibrant, and we will be archiving most SIGMOD presentations, for those who want to review them at a later date. This year, with the approval of the SIGMOD EC, we introduced two new categories of papers in the Research Track, (a) Data Science & Engineering and (b) Applications, to complement the traditional Data Management category. Data Science & Engineering papers focused on dataintensive components of data science pipelines, solving problems in areas of interest to the community inspired by real applications. Applications papers presented novel applications of data management systems and technologies to inspire future research in the community. In the Research Track this year, we received 450 research submissions (172 for Round 1 and 278 for Round 2), which were extensively reviewed by 175 program committee members, 23 associate editors, and several external reviewers. We accepted 188 submissions (a 41.8% acceptance rate), most of them after a revision phase that gave authors 10+ weeks to revise and resubmit their papers in response to the reviewer comments. This year we introduced a new set of detailed reviewing instructions focused on reviewing constructively as well as redesigned the review forms to promote constructive reviewing. In addition, we introduced a new step in the reviewing process, the Review Quality week where the associate editors check reviews for certain quality criteria and probe reviewers for constructive rewrites before reviews are released to the authors. In addition, the authors were able to provide structured feedback directly to the associate editors and the program chairs about review quality. Overall, more than 300 reviews were updated for quality during this process leading to a higher number of revision requests. In addition to the Research Track, the Industrial Track selected 21 papers from 54 submissions; the Demonstration Track selected 27 demonstrations from 75 submissions; the Tutorial Track selected 8 tutorials from 20 submissions and the Student Research Competition selected all 18 submissions for the second round of competition. This year, we will have two exciting keynote talks, reflecting emerging topics of great interest to the data management community: "Utilizing (and Designing) Modern Hardware for Data- Intensive Computations: The Role of Abstraction" by Kenneth A. Ross (Columbia University) and "Deep Data Integration" by Wang-Chiew Tan (Facebook AI). iv In addition, we will have two timely and interesting panels: "Data Management to Social Science and Back in the Future of Work" organized by Sihem Amer-Yahia (CNRS) and Senjuti Basu Roy (New Jersey Institute of Technology), and "Automation of Data Prep, ML, and Data Science: New Cure or Snake Oil?" organized by Arun Kumar (University of California, San Diego).

  • Research Article
  • Cite Count Icon 42
  • 10.14778/3611479.3611527
How Large Language Models Will Disrupt Data Management
  • Jul 1, 2023
  • Proceedings of the VLDB Endowment
  • Raul Castro Fernandez + 4 more

Large language models (LLMs), such as GPT-4, are revolutionizing software's ability to understand, process, and synthesize language. The authors of this paper believe that this advance in technology is significant enough to prompt introspection in the data management community, similar to previous technological disruptions such as the advents of the world wide web, cloud computing, and statistical machine learning. We argue that the disruptive influence that LLMs will have on data management will come from two angles. (1) A number of hard database problems, namely, entity resolution, schema matching, data discovery, and query synthesis, hit a ceiling of automation because the system does not fully understand the semantics of the underlying data. Based on large training corpora of natural language, structured data, and code, LLMs have an unprecedented ability to ground database tuples, schemas, and queries in real-world concepts. We will provide examples of how LLMs may completely change our approaches to these problems. (2) LLMs blur the line between predictive models and information retrieval systems with their ability to answer questions. We will present examples showing how large databases and information retrieval systems have complementary functionality.

  • Book Chapter
  • Cite Count Icon 12
  • 10.1007/978-3-642-04930-9_55
Semantic Enhancement for Enterprise Data Management
  • Jan 1, 2009
  • Li Ma + 7 more

Taking customer data as an example, the paper presents an approach to enhance the management of enterprise data by using Semantic Web technologies. Customer data is the most important kind of core business entity a company uses repeatedly across many business processes and systems, and customer data management (CDM) is becoming critical for enterprises because it keeps a single, complete and accurate record of customers across the enterprise. Existing CDM systems focus on integrating customer data from all customer-facing channels and front and back office systems through multiple interfaces, as well as publishing customer data to different applications. To make the effective use of the CDM system, this paper investigates semantic query and analysis over the integrated and centralized customer data, enabling automatic classification and relationship discovery. We have implemented these features over IBM Websphere Customer Center, and shown the prototype to our clients. We believe that our study and experiences are valuable for both Semantic Web community and data management community.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.31263/voebm.v72i2.2833
RDA Austria - Removing Barriers in Data Sharing
  • Aug 15, 2019
  • Mitteilungen der Vereinigung Österreichischer Bibliothekarinnen und Bibliothekare
  • Tomasz Miksa + 4 more

Research Data Alliance Austria (RDA-AT) is a national RDA node dedicated to representing emerging research and data management communities throughout Austria. RDA-AT will operate as a formal participant of RDA Europe and RDA Global, linking Austrian data management initiatives and RDA Working and Interest Groups, providing assistance in adoption of RDA recommendations, and allowing Austrian stakeholders to benefit directly from RDA support mechanisms. These new connections will bring key Austrian issues to the global data management table, and global discussions back to Austria. This paper details the goals of the RDA-AT and describes its community and sustainability plans.

  • Conference Instance
  • 10.1145/1869389
Proceedings of the Sixth International Workshop on Data Management on New Hardware
  • Jun 7, 2010

Objective The aim of this one-day workshop is to bring together researchers who are interested in optimizing database performance on modern computing infrastructure by designing new data management techniques and tools. Topics of Interest The continued evolution of computing hardware and infrastructure imposes new challenges and bottlenecks to program performance. As a result, traditional database architectures that focus solely on I/O optimization increasingly fail to utilize hardware resources efficiently. CPUs with superscalar out-of-order execution, simultaneous multi-threading, multi-level memory hierarchies, and future storage hardware (such as flash drives) impose a great challenge to optimizing database performance. Consequently, exploiting the characteristics of modern hardware has become an important topic of database systems research. The goal is to make database systems adapt automatically to the sophisticated hardware characteristics, thus maximizing performance transparently to applications. To achieve this goal, the data management community needs interdisciplinary collaboration with computer architecture, compiler and operating systems researchers. This involves rethinking traditional data structures, query processing algorithms, and database software architectures to adapt to the advances in the underlying hardware infrastructure. Paper Selection The seven papers included in the workshop were chosen by the program committee from among sixteen high-quality submissions, following a review process in which each paper received at least three reviews. Based on the reviews, one paper was selected by the workshop chairs as the recipient of the "Best Paper" award. This year, the award goes to "Wimpy Node Clusters: What About Non-Wimpy Workloads?", by Willis Lang (University of Wisconsin); Jignesh M. Patel (University of Wisconsin); Srinath Shankar (Microsoft Corp.) . Workshop Program Eight technical papers were presented at the workshop. The workshop also featured a keynote talk by Evangelos Eleftheriou, IBM Fellow, IBM Zurich Lab. Additionally, the workshop included a panel on current challenges in cloud storage, moderated by Anastasia Ailamaki.

More from: Journal of Data and Information Quality
  • New
  • Research Article
  • 10.1145/3774755
The BigFAIR Architecture: Enabling Big Data Analytics in FAIR-compliant Repositories
  • Nov 6, 2025
  • Journal of Data and Information Quality
  • João Pedro De Carvalho Castro + 3 more

  • Research Article
  • 10.1145/3770753
A GenAI System for Improved FAIR Independent Biological Database Integration
  • Oct 14, 2025
  • Journal of Data and Information Quality
  • Syed N Sakib + 3 more

  • Research Article
  • 10.1145/3770750
Ontology-Based Schema-Level Data Quality: The Case of Consistency
  • Oct 9, 2025
  • Journal of Data and Information Quality
  • Gianluca Cima + 2 more

  • Research Article
  • 10.1145/3769113
xFAIR: A Multi-Layer Approach to Data FAIRness Assessment and Data FAIRification
  • Oct 9, 2025
  • Journal of Data and Information Quality
  • Antonella Longo + 4 more

  • Research Article
  • 10.1145/3769116
FAIRness of the Linguistic Linked Open Data Cloud: an Empirical Investigation
  • Oct 9, 2025
  • Journal of Data and Information Quality
  • Maria Angela Pellegrino + 2 more

  • Research Article
  • 10.1145/3769120
Sustainable quality in data preparation
  • Oct 9, 2025
  • Journal of Data and Information Quality
  • Barbara Pernici + 12 more

  • Research Article
  • 10.1145/3769264
Editorial: Special Issue on Advanced Artificial Intelligence Technologies for Multimedia Big Data Quality
  • Sep 30, 2025
  • Journal of Data and Information Quality
  • Shaohua Wan + 3 more

  • Research Article
  • 10.1145/3743144
A Language to Model and Simulate Data Quality Issues in Process Mining
  • Jun 28, 2025
  • Journal of Data and Information Quality
  • Marco Comuzzi + 2 more

  • Research Article
  • 10.1145/3736178
Quantitative Data Valuation Methods: A Systematic Review and Taxonomy
  • Jun 24, 2025
  • Journal of Data and Information Quality
  • Malick Ebiele + 2 more

  • Research Article
  • 10.1145/3735511
Graph Metrics-driven Record Cluster Repair meets LLM-based active learning
  • Jun 24, 2025
  • Journal of Data and Information Quality
  • Victor Christen + 4 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon