Abstract

With the continuous development of information technology, various multi-media data are constantly emerging and presents the characteristics of autonomous and heterogeneous, how to integrate and analysis data more correctly and efficiently has become a challenging problem. Firstly, in order to improve the quality of the integrated data, two real-time threads combined with data adapter are used to monitor and refresh necessary updates from heterogeneous data efficiently. Once the original data has been updated, the real-time data will be loaded into the data center soon. Secondly, a data reverse cleaning method is proposed to improve the data quality. It uses the data source tree that built in the data integration process to find the location of the original data quickly after reverse cleaning. finally, a data accuracy assessment algorithm is designed for data quality assessment, which is based on Bayesian network and the path condition algorithm. Experimental results show that the quality of the integrated data significantly higher than the quality of the original data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.