Methodical Detection of Silent Data Errors from Foundry to Fleet
Silent Data Errors (SDE) represent a significant challenge for at-scale computing infrastructure because of their potential to cause data loss or data corruption. Identifying and removing the defective devices prone to SDE before they affect customer workloads is crucial. However, detecting these events is challenging due to their subtle nature and the specific conditions required for them to occur. In this work we investigate the underlying defect mechanisms responsible for SDE and formulate an end-to-end framework for efficient detection, estimation of quality metrics, and use of fault models to capture marginal defects that may cause SDE. Silicon defect characterization, enhancements to manufacturing test and system level test, use of SDE estimation metrics, and use of reinforcement learning are discussed, along with in-field testing for detection of latent defects and wear-out that may manifest as SDE.
- Research Article
83
- 10.1109/tpds.2013.2295810
- Feb 1, 2015
- IEEE Transactions on Parallel and Distributed Systems
Big sensor data is prevalent in both industry and scientific research applications where the data is generated with high volume and velocity it is difficult to process using on-hand database management tools or traditional data processing applications. Cloud computing provides a promising platform to support the addressing of this challenge as it provides a flexible stack of massive computing, storage, and software services in a scalable manner at low cost. Some techniques have been developed in recent years for processing sensor data on cloud, such as sensor-cloud. However, these techniques do not provide efficient support on fast detection and locating of errors in big sensor data sets. For fast data error detection in big sensor data sets, in this paper, we develop a novel data error detection approach which exploits the full computation potential of cloud platform and the network feature of WSN. Firstly, a set of sensor data error types are classified and defined. Based on that classification, the network feature of a clustered WSN is introduced and analyzed to support fast error detection and location. Specifically, in our proposed approach, the error detection is based on the scale-free network topology and most of detection operations can be conducted in limited temporal or spatial data blocks instead of a whole big data set. Hence the detection and location process can be dramatically accelerated. Furthermore, the detection and location tasks can be distributed to cloud platform to fully exploit the computation power and massive storage. Through the experiment on our cloud computing platform of U-Cloud, it is demonstrated that our proposed approach can significantly reduce the time for error detection and location in big data sets generated by large scale sensor network systems with acceptable error detecting accuracy.
- Research Article
24
- 10.1016/j.future.2024.06.051
- Jun 29, 2024
- Future Generation Computer Systems
Anomaly-based error and intrusion detection in tabular data: No DNN outperforms tree-based classifiers
- Research Article
17
- 10.1016/0169-7439(88)80011-x
- Jul 1, 1988
- Chemometrics and Intelligent Laboratory Systems
A Bayesian approach to gross error detection in chemical process data: Part I : Model development
- Research Article
13
- 10.1093/bioinformatics/btm260
- May 17, 2007
- Bioinformatics
Errors in nucleotide sequence and SNP genotyping data are problematic when inferring haplotypes. Previously published methods for error detection in haplotype data make use of pedigree information; however, for many samples, individuals are not related by pedigree. This article describes a method for detecting errors in haplotypes by considering the recombinational history implied by the patterns of variation, three SNPs at a time. Coalescent simulations provide evidence that the method is robust to high levels of recombination as well as homologous gene conversion, indicating that patterns produced by both proximate and distant SNPs may be useful for detecting unlikely three-site haplotypes. The perl script implementing the described method is called EDUT (Error Detection Using Triplets) and is available on request from the authors. Supplementary data are available at Bioinformatics online.
- Research Article
1
- 10.21655/ijsi.1673-7288.00295
- Jan 1, 2023
- International Journal of Software and Informatics
PDF HTML XML Export Cite reminder Cross-source Data Error Detection Approach Based on Federated Learning DOI: 10.21655/ijsi.1673-7288.00295 Author: Affiliation: Clc Number: Fund Project: Article | Figures | Metrics | Reference | Related | Cited by | Materials | Comments Abstract:With the emergence and accumulation of massive data, data governance has become an important manner to improve data quality and maximize data value. Specifically, data error detection is a crucial step to improve data quality, which has attracted wide attention from both industry and academia. At present, various detection methods tailored for a single data source have been proposed. However, in many real-world scenarios, data are not centrally stored or managed. Data from different sources but highly correlated can be employed to improve the accuracy of error detection. Unfortunately, due to privacy/security issues, cross-source data are often not allowed to be integrated centrally. To this end, this paper proposes FeLeDetect, a cross-source data error detection method based on federated learning, so as to improve the error detection accuracy by using cross-source data information on the premise of data privacy. First, a Graph-based Error Detection Model, namely GEDM, is presented to capture sufficient data features from each data source. On this basis, the paper then designs a federated co-training algorithm, namely FCTA, to collaboratively train GEDM by using different cross-source data without privacy leakage of data. Furthermore, the paper designs a series of optimization methods to reduce communication costs during federated learning and manual labeling efforts. Finally, extensive experiments on three real-world datasets demonstrate that (1) GEDM achieves an average improvement of 10.3% and 25.2% in terms of the $F1$ score in the local and centralized scenarios, respectively, outperforming all the five existing state-of-the-art methods for error detection; (2) the F1 score of the error detection by FeLeDetect is 23.2% on average higher than that by GEDM in the local scenario. Reference Related Cited by
- Research Article
8
- 10.1016/0169-7439(88)80085-6
- Aug 1, 1988
- Chemometrics and Intelligent Laboratory Systems
A Bayesian approach to gross error detection in chemical process data
- Conference Article
15
- 10.1109/mfi.1994.398408
- Oct 2, 1994
The paper addresses the question of performance improvement as a result of multisensor data fusion and its ramifications on the design of a data fusion system. Sensor selectivity requires that data quality control and error detection capabilities be incorporated in the fusion design. Data quality control and error detection may not be feasible at the signal level requiring additional intelligence to be built in the fusion loop, prior to or after fusing the data. This leads to the notion of intelligent data fusion design which involves pre-fusion data quality control loops for error detection prior to fusion using data models and post-fusion data quality control loops based on meta-fusion level inference. Shortcomings in applying the optimal fusion rules in the presence of partial statistical knowledge and means to overcome them are discussed. The need of data validation and adaptive sensitivity control in the fusion design, when optimality conditions are not satisfied, is demonstrated and suggestions for designing the feedback loop are given. The design of an intelligent data fusion is discussed and a design for adaptive sensor sensitivity control presented. >
- Research Article
15
- 10.1016/j.procs.2013.05.373
- Jan 1, 2013
- Procedia Computer Science
Autonomous Data Error Detection and Recovery in Streaming Applications
- Book Chapter
- 10.4018/9781930708440.ch016
- Jan 18, 2011
Data stored in organizational databases have a significant error rate. As computerized databases continue to proliferate, the number of errors in stored data and the organizational impact of these errors are likely to increase. The impact of data errors on business processes and decision making can be lessened if users of information systems are able and willing to detect and correct data errors. However, some published research suggests that users of information systems do not detect data errors. This paper reports the results of a study showing that municipal bond analysts detect data errors. The results provide insight into the conditions under which users in organizational settings detect data errors. Guidelines for improving error detection are also discussed.
- Book Chapter
- 10.4018/9781930708440.ch016.ch000
- Jan 18, 2011
Data stored in organizational databases have a significant error rate. As computerized databases continue to proliferate, the number of errors in stored data and the organizational impact of these errors are likely to increase. The impact of data errors on business processes and decision making can be lessened if users of information systems are able and willing to detect and correct data errors. However, some published research suggests that users of information systems do not detect data errors. This paper reports the results of a study showing that municipal bond analysts detect data errors. The results provide insight into the conditions under which users in organizational settings detect data errors. Guidelines for improving error detection are also discussed.Request access from your librarian to read this chapter's full text.
- Book Chapter
- 10.4018/978-1-930708-44-0.ch016
- Jan 1, 2002
Data stored in organizational databases have a significant error rate. As computerized databases continue to proliferate, the number of errors in stored data and the organizational impact of these errors are likely to increase. The impact of data errors on business processes and decision making can be lessened if users of information systems are able and willing to detect and correct data errors. However, some published research suggests that users of information systems do not detect data errors. This paper reports the results of a study showing that municipal bond analysts detect data errors. The results provide insight into the conditions under which users in organizational settings detect data errors. Guidelines for improving error detection are also discussed.
- Research Article
1
- 10.1080/08874417.1999.11647436
- Dec 1, 1999
- Journal of Computer Information Systems
There is strong evidence that data stored in organizational databases have a significant rate of errors. As computerized databases continue to proliferate, the number of errors in stored data and the organizational impact of these errors are likely to increase. The impact of data errors on business processes and decision making can be lessened if users of information systems are able and willing to detect and correct data errors. However, some published research suggests that users of information systems do not detect data errors. A field study reported here shows that inventory managers detect data errors under some conditions. Perceptions of the payoffs of error detection and the rate of errors in inventory management are also discussed.
- Research Article
2
- 10.1145/3709677
- Feb 10, 2025
- Proceedings of the ACM on Management of Data
String data is common in real-world datasets: 67.6% of values in a sample of 1.8 million real Excel spreadsheets from the web were represented as text. Automatically cleaning such string data can have a significant impact on users. Previous approaches are limited to error detection, require that the user provides annotations, examples, or constraints to fix the errors, and focus independently on syntactic errors or semantic errors in strings, but ignore that strings often contain both syntactic and semantic substrings. We introduce DataVinci, a fully unsupervised string data error detection and repair system. DataVinci learns regular-expression-based patterns that cover a majority of values in a column and reports values that do not satisfy such majority patterns as data errors. DataVinci can automatically derive edits to the data error based on the majority patterns and using row tuples associated with majority values as examples. To handle strings with both syntactic and semantic substrings, DataVinci uses an LLM to abstract (and re-concretize) portions of strings that are semantic. Because not all data columns can result in majority patterns, when available, DataVinci can leverage execution information from an existing data program (which uses the target data as input) to identify and correct data repairs that would not otherwise be identified. DataVinci outperforms eleven baseline systems on both data error detection and repair as demonstrated on four existing and new benchmarks.
- Research Article
2
- 10.1016/j.jclinepi.2023.02.015
- Feb 17, 2023
- Journal of clinical epidemiology
ObjectivesWe evaluated the error detection performance of the DetectDeviatingCells (DDC) algorithm which flags data anomalies at observation (casewise) and variable (cellwise) level in continuous variables. We compared its performance to other approaches in a simulated dataset. Study Design and SettingWe simulated height and weight data for hypothetical individuals aged 2–20 years. We changed a proportion of height values according to predetermined error patterns. We applied the DDC algorithm and other error-detection approaches (descriptive statistics, plots, fixed-threshold rules, classic, and robust Mahalanobis distance) and we compared error detection performance with sensitivity, specificity, likelihood ratios, predictive values, and receiver operating characteristic (ROC) curves. ResultsAt our chosen thresholds error detection specificity was excellent across all scenarios for all methods and sensitivity was higher for multivariable and robust methods. The DDC algorithm performance was similar to other robust multivariable methods. Analysis of ROC curves suggested that all methods had comparable performance for gross errors (e.g., wrong measurement unit), but the DDC algorithm outperformed the others for more complex error patterns (e.g., transcription errors that are still plausible, although extreme). ConclusionsThe DDC algorithm has the potential to improve error detection processes for observational data.
- Conference Article
3
- 10.1109/dft50435.2020.9250878
- Oct 19, 2020
This work proposes in-situ Real-Time Error Detection (RTD): embedding hardware in a memory array for detecting a fault in the array when it occurs, rather than when it is read. RTD breaks the serialization between data access and error detection and, thus, it can speed-up the access-time of arrays that use in-line error-detection and correction. The approach can also reduce the time needed to root-cause array related bugs during post-silicon validation and product testing. The paper presents how to build RTD into an array with flip-flops to track in real-time the column-parity and introduces a two-dimensional RTD based error-correction scheme. As compared to SECDED, the evaluated scheme has comparable error-detection and correction strength and, depending on the array dimensions, the access time is reduced by 8-24% at an area and power overhead between 12-53% and 21-42% respectively.