Anomaly-based Assessment of Semantic Consistency: Design and Evaluation of a Novel Probability-based Metric in Cooperation with a German Car Manufacturer

  • Abstract
  • References
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

In the era of big data, the rate at which data is produced and analyzed is increasing steadily. However, the occurrence of conflicting data can significantly deteriorate the quality of the analyses and diminish the potential benefits of big data. Thus, ensuring semantic consistency of data is essential. Existing approaches for assessing semantic consistency are typically based on explicit rules, which cannot be generated efficiently in high-dimensional and big data. Furthermore, the rules are typically regarded as “true by definition”, disregarding the uncertainty introduced by natural process fluctuations. Against this background, we propose a novel, probability-based metric that provides interpretable metric values representing the probability that a record in a database is semantically consistent. Unlike existing approaches, our metric is not based on explicit rules, but on the equivalence between internal contradictions and anomalies within the data. Based on the isolation forest as a highly scalable anomaly detection method, it can handle high-dimensional big data. To account for the associated uncertainty, we combine the idea of the isolation forest with the Grubbs outlier test to assess the probability that a record is semantically consistent (i.e., not an anomaly). We demonstrate the practical applicability of our metric and evaluate its values using the case of production data of a German car manufacturer, as well as publicly available benchmark data for fault diagnosis. The evaluation shows that our metric can distinguish well between semantically consistent and inconsistent data.

ReferencesShowing 10 of 77 papers
  • Cite Count Icon 40
  • 10.1016/j.dss.2015.02.009
Metric-based data quality assessment — Developing and evaluating a probability-based currency metric
  • Feb 16, 2015
  • Decision Support Systems
  • Bernd Heinrich + 1 more

  • Cite Count Icon 178
  • 10.1007/978-3-319-67925-9_1
Big Data Analytics: Applications, Prospects and Challenges
  • Nov 1, 2017
  • Konstantinos Vassakis + 2 more

  • Cite Count Icon 20238
  • 10.1148/radiology.143.1.7063747
The meaning and use of the area under a receiver operating characteristic (ROC) curve.
  • Apr 1, 1982
  • Radiology
  • J A Hanley + 1 more

  • Cite Count Icon 238
  • 10.1007/978-3-319-24106-7
Data and Information Quality
  • Jan 1, 2016
  • Carlo Batini + 1 more

  • Open Access Icon
  • Cite Count Icon 25
  • 10.1007/s12525-017-0261-6
Big data analytics in electronic markets
  • Jun 9, 2017
  • Electronic Markets
  • Eric W T Ngai + 4 more

  • Cite Count Icon 19
  • 10.1145/2935753
Veracity of Big Data
  • Aug 17, 2016
  • Journal of Data and Information Quality
  • Laure Berti-Equille + 1 more

  • Open Access Icon
  • Cite Count Icon 199
  • 10.25300/misq/2016/40:4.03
Transformational Issues of Big Data and Analytics in Networked Business
  • Dec 1, 2016
  • MIS Quarterly
  • Bart Baesens + 4 more

  • Cite Count Icon 9
  • 10.1080/12460125.2022.2071404
Data-driven decision making: new opportunities for DSS in data stream contexts
  • May 4, 2022
  • Journal of Decision Systems
  • Nuria Mollá + 2 more

  • Cite Count Icon 109
  • 10.1145/1891879.1891881
The Effects and Interactions of Data Quality and Problem Complexity on Classification
  • Feb 1, 2011
  • Journal of Data and Information Quality
  • Roger Blake + 1 more

  • Cite Count Icon 176
  • 10.14778/1453856.1453900
On generating near-optimal tableaux for conditional functional dependencies
  • Aug 1, 2008
  • Proceedings of the VLDB Endowment
  • Lukasz Golab + 4 more

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.52214/vib.v7i.8403
Legal Governance of Brain Data Derived from Artificial Intelligence
  • Jun 2, 2021
  • Voices in Bioethics
  • Mahika Ahluwalia

Legal Governance of Brain Data Derived from Artificial Intelligence

  • Research Article
  • Cite Count Icon 25
  • 10.1016/j.cie.2020.106643
Root cause analysis approach based on reverse cascading decomposition in QFD and fuzzy weight ARM for quality accidents
  • Jul 7, 2020
  • Computers & Industrial Engineering
  • Panting Duan + 5 more

Root cause analysis approach based on reverse cascading decomposition in QFD and fuzzy weight ARM for quality accidents

  • Research Article
  • Cite Count Icon 21
  • 10.1109/tii.2021.3073645
Rethinking the Value of Just-in-Time Learning in the Era of Industrial Big Data
  • Feb 1, 2022
  • IEEE Transactions on Industrial Informatics
  • Zeyu Yang + 1 more

Just-in-time learning (JITL) has become a widely used industrial process modeling tool. With the advent of the industrial big data era, rich data information has brought new opportunities to JITL. Specifically, the completeness of data samples in the era of big data provides an important premise and support for the JITL method, prompting us to rethink the application value of JITL in the context of industrial big data. At the same time, the huge amount of data causes certain difficulties in data searching, which is a key issue for JITL. In this article, a parallel computing strategy is adopted to divide the entire computational searching task into several subtasks, and assign the subtasks to parallel computing nodes to complete parallel searching. In this way, not only the full advantage of big data information is effectively utilized, but also the search capability and efficiency under big data are improved. In addition, in order to improve the real-time nature of JITL, a model library management (MLM) strategy is adopted, and the query similar samples are used to operate with existing similar models. And by selectively adding new data, the database management (DBM) strategy is also developed, which not only alleviates the problem of information redundancy, but also reduces the search pressure caused by the increasingly large database. Obviously, MLM and DBM are particularly important as JITL auxiliary tools under industrial big data. Combining parallel computing, a parallel JITL (P-JITL) framework is proposed. As an example, the variational Bayesian factor regression model is transformed into the parallel Bayesian-JITL method for big process data modeling, which is further extended to a nonlinear form. To evaluate the feasibility and efficiency of the developed methods, a real industrial case is demonstrated.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-030-62743-0_40
Wisdom Media Era of Big Data in the Application of the Short Video from the Media
  • Nov 4, 2020
  • Zhi Li

In today’s era, with the wide application of artificial intelligence AI, the application of media is gradually approaching to intelligence, the application of big data in the era of smart media has gradually become an indispensable part of social life. The era of big data has promoted the development and progress of the society and facilitated people’s life. Accordingly, new media such as short video “we media” have sprung up like mushrooms and entered the public’s vision. Through the application of big data in the era of smart media, this paper analyzes the development trend of short video we media in social life, analyzes the influence of the era of big data on short video, and further reflects the general advantages of the era of smart media from the discussion of short video we media in the era of intelligent big data. This paper discusses some entertainment and convenience created by the application of the short video “we media” in the era of smart media for People’s Daily life. Based on the problems and challenges encountered in the application of smart media in some fields, it puts forward specific plans for the security protection of people’s personal information in the era of big data. The purpose of this paper is to try to reveal the application of smart media and big data to short video in this era and the corresponding research. Based on the above discussion, this paper combines big data analysis and relevant theoretical knowledge in the field of news media, combines intelligence with “we media”, and studies the value of “we media” short videos to the social development in the era of smart media. This article research results show that the wide application of wisdom media era of big data is a trend of rapid development of today’s society, in this trend, a short video from the development of the media heat continues to increase, not only make the communication between people more close, and accelerated the development of the modern intelligent society and optimize the traditional mode of transmission medium.

  • Conference Article
  • Cite Count Icon 29
  • 10.1109/bigdata.2018.8622260
Privacy-Preserving Frequent Pattern Mining from Big Uncertain Data
  • Dec 1, 2018
  • Carson K Leung + 4 more

As we are living in the era of big data, high volumes of wide varieties of data which may be of different veracity (e.g., precise data, imprecise and uncertain data) are easily generated or collected at a high velocity in many real-life applications. Embedded in these big data is valuable knowledge and useful information, which can be discovered by big data science solutions. As a popular data science task, frequent pattern mining aims to discover implicit, previously unknown and potentially useful information and valuable knowledge in terms of sets of frequently co-occurring merchandise items and/or events. Many of the existing frequent pattern mining algorithms use a transaction-centric mining approach to find frequent patterns from precise data. However, there are situations in which an item-centric mining approach is more appropriate, and there are also situations in which data are imprecise and uncertain. Hence, in this paper, we present an item-centric algorithm for mining frequent patterns from big uncertain data.In recent years, big data have been gaining the attention from the research community as driven by relevant technological innovations (e.g., clouds) and novel paradigms (e.g., social networks). As big data are typically published online to support knowledge management and fruition processes, these big data are usually handled by multiple owners with possible secure multi-part computation issues. Thus, privacy and security of big data has become a fundamental problem in this research context. In this paper, we present, not only an item-centric algorithm for mining frequent patterns from big uncertain data, but also a privacy-preserving algorithm. In other words, we present— in this paper—a privacy-preserving item-centric algorithm for mining frequent patterns from big uncertain data. Results of our analytical and empirical evaluation show the effectiveness of our algorithm in mining frequent patterns from big uncertain data in a privacy-preserving manner.

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.neucom.2016.06.080
HB-File: An efficient and effective high-dimensional big data storage structure based on US-ELM
  • Feb 16, 2017
  • Neurocomputing
  • Linlin Ding + 4 more

HB-File: An efficient and effective high-dimensional big data storage structure based on US-ELM

  • Research Article
  • Cite Count Icon 10
  • 10.1360/n972015-01035
Discussion on geological science big data and its applications
  • May 16, 2016
  • Chinese Science Bulletin
  • Chonglong Wu + 4 more

The geological big data is a kind of spatio-temporal big data whose characteristics has both similarities and marked differences with the general big data. Due to long-term evolution of geological objects with huge space and influence of various geological process, its development undergoes a complicated process. Geological bodies are buried in deep underground so that geological data has the feature of incomplete information of parameter, structure, relationship and evolution, which has high computation complexity with high dimension and significant uncertainty. In order to address those challenges, the theories, methods and techniques of big data should be introduced to integrate and utilize geological science big data. It is necessary to set up unified spatial reference system for geological spatio-temporal big data to realize data consistency processing and fusion with regards to spatial standard benchmark, tense, scale and semantics. Various kinds of integrated storage and management methods for static geological exploration data and dynamic distributed geological observation data will be developed. On the one hand, current big data techniques can be adapted, such as realtime data storage and management, big data storage and management architecture, and analysis and processing architecture. On the other hand, specialized big data techniques based on actual situation should be developed, such as distributed parallel spatio-temporal indexing for geological data, and pre-dispatch method considering spatio- temporal relationship and geological semantics. To judge a traditional algorithm whether is suitable for being transformed into big data environment, main criteria include algorithm task decomposability, data decomposability and data flow segmented relevance. Hence, the transformation of traditional geological data processing methods will be from these three aspects to excavate the task parallelism, geometry decomposability and reduce the data flow segmented relevance as much as possible to adapt to the distributed computing and high performance computing of big data environment. The big data technique, which may be a breakthrough in geological science, can mine knowledge from massive spatio-temporal and text data directly in spite of the limits in geological sampling methods such as the randomness and sparsity of sample space, the judge relied on inadequate observational data and fixed mode, and traditional data analysis methods. A series of mathematical geology methods and spatial data mining methods which are frequently used nowadays can be used in geological spatio-temporal big data mining after being remolded. The key science and technology issues referred in geological science development in big data era can be listed as follows—the integrated storage management and handling of structured, semi-structured and unstructured data, big and small data, hybrid and precision data, model and data, static exploration model and dynamic monitoring model, the combination of data mining and data analysis, the unification of correlation and causality, deep mining and visualization of geological science big data, and so on. Taking metal mineral metallogenic prediction as an example, geological big data utilization and analysis are studied. In terms of a global view, Glass Earth is the underground extention of Digital Earth. Glass Earth is a kind of 3D visualized virtual shallow crust, which is an aggregation of geological and geographical information and stored on the computer network to be accessed by multi-user and provide analysis service and decision making to the research and applications of geology, resources and environment. The Glass Earth is the effective vehicle of geological science big data and the construction of the Glass Earth is one of the best ways to solve all the aforementioned scientific and technical problems.

  • Research Article
  • 10.1108/jaoc-08-2024-0255
Big data analytics role in shaping the work of accounting function and accounting professionals
  • Sep 15, 2025
  • Journal of Accounting & Organizational Change
  • Abhishek Mukherjee + 2 more

Purpose This paper aims to examine how big data can be applied in accounting and identifies the competences future accountants at organizational level need to possess to remain competitive. Design/methodology/approach This study used the association of chartered certified accountants (ACCA’s) framework and the technological, organization and environmental (TOE) theoretical framework to identify the skills and capabilities urgently needed by accounting professionals at the organizational level in the context of big data analytics. The authors conducted comprehensive content analysis of academic literature issues from professional accounting bodies, and reports of accounting practitioners to gain insights of competencies that accounting professionals at organizational level need to possess to be prominent in the big data era. Findings The findings show that in the big data era accounting professionals should acquire the seven competences to be competitive. The competencies are (1) skills in data analytic techniques and the use of statistical models; (2) knowledge of the business; (3) skills to control the risk from the emerging technologies; (4) skills and knowledge in accounting profession; (5) skills to understand the insights from big data analytics; (6) skills to test the quality, veracity and integrity of data; and (7) skills to communicate with others. The study also found that these competences do not exist alone. These competences are closely related to each other and complement each other, which means that accounting professionals cannot expect to stand out in the future competition by mastering only one or two of these seven competences. Research limitations/implications The application of big data analytics is still in its infancy and its effects have yet to be confirmed so that there is no reliable case to collect and analyse. One limitation is the absence of practical case in this paper. The findings will help accounting professionals to understand the future of work. Practical implications The findings will help accounting professionals to acquire new skills and knowledge in related areas and remain competitive in the post big data era. Social implications This study highlights skills necessary for accountants to thrive in a big data era, emphasizing the shift from traditional accounting functions to roles that integrate advanced data capabilities and strategic insight. Originality/value This is one of the few papers that seek to explore the competencies required by accounting professional at organizational level in the big data era using the ACCA’s research of accountant’s professional quotients and TOE theoretical framework. This paper seeks to contribute to the literature on accounting professionals’ competencies at organizational level in the big data era. Big data analytics in accounting is influencing the evolution of accounting professionals’ competencies and is a key interest in accounting change literature. The nature of accounting work will undergo fundamental changes due to big data. Yet we know little about this due to big data being in its infancy stage in the accounting literature.

  • Research Article
  • Cite Count Icon 2
  • 10.1088/1742-6596/1533/3/032061
Digital Management of Sports Industry Based on Big Data Era
  • Apr 1, 2020
  • Journal of Physics: Conference Series
  • Xiaodong Wang

In the era of big data, the sports industry is facing transformation and upgrading. Big data has become the key technology to lead the cross-border sports industry and strengthen the sports industry. In the era of big data, the development direction of sports industry is based on the development concept of big data sharing of sports industry, relying on the refinement and personalized service of big data to the market, aiming to build a sports industry chain of industrial integration and build a big data platform of sports industry, so as to realize online sports consumption and offline sports experience. Based on the above background, this paper aims to study the digital management of sports industry in the era of big data. With the maturity of big data technology, the combination of big data and sports industry has become inevitable. Based on the simple analysis of the characteristics and background of the era of big data, this paper discusses the opportunities and challenges faced by the big data sports industry, and proposes to promote the research of digital management of sports industry from the aspects of improving data storage and processing capacity. In the era of big data, sports industry management should be able to use big data technology to analyze valuable information from massive data. In the era of big data, the storage of massive information requires data analysis and processing, and timely feedback of information, which all bring great challenges to the sports industry.

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/fuzz-ieee.2019.8858943
Do we still need fuzzy classifiers for Small Data in the Era of Big Data?
  • Jun 1, 2019
  • Mikel Elkano + 2 more

The Era of Big Data has forced researchers to explore new distributed solutions for building fuzzy classifiers, which often introduce approximation errors or make strong assumptions to reduce computational and memory requirements. As a result, Big Data classifiers might be expected to be inferior to those designed for standard classification tasks (Small Data) in terms of accuracy and model complexity. To our knowledge, however, there is no empirical evidence to confirm such a conjecture yet. Here, we investigate the extent to which state-of-the-art fuzzy classifiers for Big Data sacrifice performance in favor of scalability. To this end, we carry out an empirical study that compares these classifiers with some of the best performing algorithms for Small Data. Assuming the latter were generally designed for maximizing performance without considering scalability issues, the results of this study provide some intuition around the tradeoff between performance and scalability achieved by current Big Data solutions. Our findings show that, although slightly inferior, Big Data classifiers are gradually catching up with state-of-the-art classifiers for Small data, suggesting that a unified learning algorithm for Big and Small Data might be possible.

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/phm.2016.7819776
Big data oriented root cause identification approach based on PCA and SVM for product infant failure
  • Oct 1, 2016
  • Zhenzhen He + 2 more

Due to the increasing complexity and huge number of uncontrolled operational factors in manufacturing, the produced product usually comes with an exceptional high infant failure rate, and the root cause identification of product infant failure has been a very challenging issue for manufacturers. Especially in the era of big data, the large number of data could be collected from the product life cycle easily, these high-dimensional big data always bear many un-correlation noise information, which has caused serious problem that not only the accuracy may not be remarkable, but also the model-training time may be redundant to most of the current small data-driven method. Furthermore, traditional small data oriented analytic techniques are not applicable to the new big data environment. In order to solve this dilemma, this paper proposed a new method to identify the root cause of infant failure from the big data using the principal component analysis (PCA) and support vector machine (SVM). Firstly, data collected from design, manufacturing, and usage related to product infant failure mechanism has been divided into training data and test data. Secondly, PCA is applied to eliminate redundancy and reducing data dimension of original process feature parameters from raw data in low-dimensional space so that the key variables as the potential root cause candidates can be extracted. Thirdly, an SVM-based optimal hyper-plane to separate these candidates' features data is presented, and one-versus-all SVM classifier is designed to identify the final list of the root cause for infant failure by radial basis kernel function. Finally, the feasibility and validity of the proposed methods are demonstrated through a case study of computer board failure analysis.

  • Conference Article
  • 10.5555/2694476.2694509
Making online BigData small: reducing computation cost and latency in web analytics through sampling
  • Dec 19, 2013
  • Aparna Das + 1 more

In the era of big data, the volume, velocity and variety of data are exploding at an unprecedented pace. With the explosion of data at the web, internet companies are working towards building powerful data analytics system that can crunch the big user data available to them and offer rich business insights. However, generating analytical reports and insights from high dimensional online user data involves significant computation cost. In addition, enterprises are now also looking for low latency analytics systems to dramatically accelerate the time from data to decision by minimizing the delay between the transaction and decision. In this work, we attempt to create smaller samples from the actual online big user data, so that analytical reports can be derived from the sample data faster and at a reduced computation cost without significant trade-offs in precision and accuracy. This study empirically analyzes petabytes of traffic data across Yahoo sites and develops efficient sampling and metric computation mechanisms for large scale web traffic. We generated analytical reports containing traffic and user engagement metrics from both sampled and actual Yahoo web data and compared them in terms of latency, computation cost and accuracy to show the effectiveness of our approach.

  • Conference Article
  • 10.12783/dtem/mebit2021/35649
RESEARCH ON THE INTERNET PLUS PROMOTING TAX LAW REFORM AND INNOVATION IN THE ERA OF BIG DATA
  • Jun 19, 2021
  • Quan Tao

In the era of big data, it is particularly necessary to use Internet plus technology to promote tax law teaching reform and innovation. This paper analyzes the current situation, points out the existing problems, and puts forward specific measures for Internet plus promoting tax reform and innovation in the era of big data, so as to facilitate the reform and development of tax law teaching in Chinese universities. In the era of big data, with the rapid development of computer technology and the continuous improvement of Internet plus, people's lives have entered the information society. People are increasingly demanding information, big data has become a hot term, and infiltrate into various industries. Nowadays, science and technology are developed, big data information is unblocked, people communicate more closely, life is more convenient, big data is the product of high-tech era.“ Internet plus” is the new form of Internet development under the innovation, which promotes social and economic entities and becomes a platform for reform, innovation and development. Internet plus relying on Internet technology to integrate Internet and traditional industries, optimize production factors and restructure business models to achieve economic transformation and upgrading, give full play to the advantages of the Internet, upgrade industrial productivity and increase social wealth by upgrading industries. Through the characteristics of openness, equality and interaction, the Internet uses big data analysis to transform the traditional industrial model and enhance the power of economic development, so as to promote the healthy and rapid development of national economy [1]. Based on the background of Internet plus in the era of big data, this article discusses the development and innovation of tax law teaching reform in our country. It expects that the traditional tax law teaching mode can help students integrate new teaching and big data technology with the help of big data technology and Internet plus platform, and better train new talents suitable for the big data era. It is a contribution to the construction and economic development of "double first-class" in universities in China.

  • Front Matter
  • Cite Count Icon 10
  • 10.1016/j.adaj.2015.09.002
Taking a byte out of big data
  • Oct 26, 2015
  • The Journal of the American Dental Association
  • Michael Glick

Taking a byte out of big data

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.1266/ggs.16-00068
An artificial intelligence approach fit for tRNA gene studies in the era of big sequence data.
  • Jan 1, 2017
  • Genes & genetic systems
  • Yuki Iwasaki + 4 more

Unsupervised data mining capable of extracting a wide range of knowledge from big data without prior knowledge or particular models is a timely application in the era of big sequence data accumulation in genome research. By handling oligonucleotide compositions as high-dimensional data, we have previously modified the conventional self-organizing map (SOM) for genome informatics and established BLSOM, which can analyze more than ten million sequences simultaneously. Here, we develop BLSOM specialized for tRNA genes (tDNAs) that can cluster (self-organize) more than one million microbial tDNAs according to their cognate amino acid solely depending on tetra- and pentanucleotide compositions. This unsupervised clustering can reveal combinatorial oligonucleotide motifs that are responsible for the amino acid-dependent clustering, as well as other functionally and structurally important consensus motifs, which have been evolutionarily conserved. BLSOM is also useful for identifying tDNAs as phylogenetic markers for special phylotypes. When we constructed BLSOM with 'species-unknown' tDNAs from metagenomic sequences plus 'species-known' microbial tDNAs, a large portion of metagenomic tDNAs self-organized with species-known tDNAs, yielding information on microbial communities in environmental samples. BLSOM can also enhance accuracy in the tDNA database obtained from big sequence data. This unsupervised data mining should become important for studying numerous functionally unclear RNAs obtained from a wide range of organisms.

More from: Journal of Data and Information Quality
  • New
  • Research Article
  • 10.1145/3774755
The BigFAIR Architecture: Enabling Big Data Analytics in FAIR-compliant Repositories
  • Nov 6, 2025
  • Journal of Data and Information Quality
  • João Pedro De Carvalho Castro + 3 more

  • Research Article
  • 10.1145/3770753
A GenAI System for Improved FAIR Independent Biological Database Integration
  • Oct 14, 2025
  • Journal of Data and Information Quality
  • Syed N Sakib + 3 more

  • Research Article
  • 10.1145/3770750
Ontology-Based Schema-Level Data Quality: The Case of Consistency
  • Oct 9, 2025
  • Journal of Data and Information Quality
  • Gianluca Cima + 2 more

  • Research Article
  • 10.1145/3769113
xFAIR: A Multi-Layer Approach to Data FAIRness Assessment and Data FAIRification
  • Oct 9, 2025
  • Journal of Data and Information Quality
  • Antonella Longo + 4 more

  • Research Article
  • 10.1145/3769116
FAIRness of the Linguistic Linked Open Data Cloud: an Empirical Investigation
  • Oct 9, 2025
  • Journal of Data and Information Quality
  • Maria Angela Pellegrino + 2 more

  • Research Article
  • 10.1145/3769120
Sustainable quality in data preparation
  • Oct 9, 2025
  • Journal of Data and Information Quality
  • Barbara Pernici + 12 more

  • Research Article
  • 10.1145/3769264
Editorial: Special Issue on Advanced Artificial Intelligence Technologies for Multimedia Big Data Quality
  • Sep 30, 2025
  • Journal of Data and Information Quality
  • Shaohua Wan + 3 more

  • Research Article
  • 10.1145/3743144
A Language to Model and Simulate Data Quality Issues in Process Mining
  • Jun 28, 2025
  • Journal of Data and Information Quality
  • Marco Comuzzi + 2 more

  • Research Article
  • 10.1145/3736178
Quantitative Data Valuation Methods: A Systematic Review and Taxonomy
  • Jun 24, 2025
  • Journal of Data and Information Quality
  • Malick Ebiele + 2 more

  • Research Article
  • 10.1145/3735511
Graph Metrics-driven Record Cluster Repair meets LLM-based active learning
  • Jun 24, 2025
  • Journal of Data and Information Quality
  • Victor Christen + 4 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon