Deduplication Methods Using Levenshtein Distance Algorithm

Eugene S Valeriano

doi:10.52783/jes.3480

Deduplication Methods Using Levenshtein Distance Algorithm

Eugene S Valeriano

Open Access

https://doi.org/10.52783/jes.3480

Copy DOI

Journal: Journal of Electrical Systems	Publication Date: May 4, 2024
License type: CC BY-ND 4.0

#Levenshtein Distance Algorithm #Restaurant Dataset + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

The study aimed to propose methods to improve the data integrity of the Relational databases such as MS SQL, MySQL and PostgreSQL via record duplication detection. The FODORS and ZAGAT Restaurant database benchmark datasets have been utilized to facilitate the processes involved in preparing and delivering high-quality data. Furthermore, the Levenshtein distance algorithm was used to propose three (3) methods namely: default, eliminating equal string, and knowledge-based libraries to cut duplicate records in the database. In the 70% selected threshold, the average detected duplicate records of 88 out of 112 records in the restaurant dataset. Finally, to efficiently detect duplicate records in the database, depend on the data being analyzed and threshold selected.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Journal of Electrical Systems

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.