Comparison of String Similarity Algorithm in post-processing OCR

Al Birr Karim Susanto,Muljono Muljono,Bagus Nugroho,Nuraziz Muliadi

doi:10.33633/jais.v8i1.7079

Comparison of String Similarity Algorithm in post-processing OCR

Al Birr Karim Susanto, Muljono Muljono + Show 2 more

Open Access

https://doi.org/10.33633/jais.v8i1.7079

Copy DOI

Journal: Journal of Applied Intelligent System	Publication Date: Feb 17, 2023
License type: cc-by

Affiliation: Universitas Dian Nuswantoro

#Optical Character Recognition Process #Optical Character Recognition + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

The Optical Character Recognition (OCR) problem that often occurs is that the image used, has a lot of noise covering letters in a word partially. This can cause misspellings in the process of word recognition or detection in the image. After the OCR process, we must do some post-processing for correcting the word. The words will be corrected using a string similarity algorithm. So what is the best algorithm? We conducted a comparison algorithm including the Levenshtein distance, Hamming distance, Jaro-Winkler, and Sørensen – Dice coefficient. After testing, the most effective algorithm is the Sørensen-Dice coefficient with a value of 0.88 for the value of precision, recall, and F1 score

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Journal of Applied Intelligent System

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.