Abstract
PDF HTML XML Export Cite reminder Cross-source Data Error Detection Approach Based on Federated Learning DOI: 10.21655/ijsi.1673-7288.00295 Author: Affiliation: Clc Number: Fund Project: Article | Figures | Metrics | Reference | Related | Cited by | Materials | Comments Abstract:With the emergence and accumulation of massive data, data governance has become an important manner to improve data quality and maximize data value. Specifically, data error detection is a crucial step to improve data quality, which has attracted wide attention from both industry and academia. At present, various detection methods tailored for a single data source have been proposed. However, in many real-world scenarios, data are not centrally stored or managed. Data from different sources but highly correlated can be employed to improve the accuracy of error detection. Unfortunately, due to privacy/security issues, cross-source data are often not allowed to be integrated centrally. To this end, this paper proposes FeLeDetect, a cross-source data error detection method based on federated learning, so as to improve the error detection accuracy by using cross-source data information on the premise of data privacy. First, a Graph-based Error Detection Model, namely GEDM, is presented to capture sufficient data features from each data source. On this basis, the paper then designs a federated co-training algorithm, namely FCTA, to collaboratively train GEDM by using different cross-source data without privacy leakage of data. Furthermore, the paper designs a series of optimization methods to reduce communication costs during federated learning and manual labeling efforts. Finally, extensive experiments on three real-world datasets demonstrate that (1) GEDM achieves an average improvement of 10.3% and 25.2% in terms of the $F1$ score in the local and centralized scenarios, respectively, outperforming all the five existing state-of-the-art methods for error detection; (2) the F1 score of the error detection by FeLeDetect is 23.2% on average higher than that by GEDM in the local scenario. Reference Related Cited by
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.