Abstract

<span lang="EN-US">The purpose of data integration is to integrate the multiple sources of heterogeneous data available on the internet, such as text, image, and video. After this stage, the data becomes large. Therefore, it is necessary to analyze the data that can be used for the efficient execution of the query. However, we have problems with solving entities, so it is necessary to use different techniques to analyze and verify the data quality in order to obtain good data management. Then, when we have a single database, we call this mechanism deduplication. To solve the problems above, we propose in this article a method to calculate the similarity between the potential duplicate data. This solution is based on graphics technology to narrow the search field for similar features. Then, a composite mechanism is used to locate the most similar records in our database to improve the quality of the data to make good decisions from heterogeneous sources.</span>

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.