Abstract

Most existing systems suffer from data quality problems. Data quality has been affected by many factors such as manual operation, software problems and hardware problems, especially data inconsistencies. As an important carrier of data, database system plays an important role in distributed systems. In order to reduce the impact of data inconsistency on data quality in distributed database systems, we design and implement a multi-source data inconsistency detection and repair method based on CRC algorithm.The idea of the proposed techniques is to use the rolling checksum in the rsync algorithm. In the process of data inconsistency detection, the method divides the table into chunks and calculates the checksums of data chunks in parallel for multiple data tables to detect and repair the inconsistent data. The experimental results show that the detection effect of this method is consistent with that of the traditional method which comparing source data with target data. The detection rate is as high as 99%, but it performs better than the traditional method, and the running time is reduced by about 20%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.