Analyzing and Experimenting Open Source OCR Engines in RPA with Levenshtein Distance Algorithm

Malathi T,Niranjan V,Diwaan Chandar C S,Swashthika A K,Nithish S

doi:10.47392/irjash.2020.269

Malathi T, Niranjan V + Show 3 more

Open Access

PDF Available

https://doi.org/10.47392/irjash.2020.269

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Robotic Process Automation is a platform used to automate boring and repetitive computer processes using software bots so that humans could involve in tasks which include creativity and decision making which could not be done by robots. Optical Character Recognition takes out printed characters in an image and converts it to text. Google Tesseract OCR and Microsoft OCR were the commonly used OCR engines available in UiPath, a tool for Robotic Process Automation. In Previous, research on comparing those two open source OCR engine, there we made comparison on basic factors which included speed, hardware requirements, accuracy ,but in that case, accuracy was been calculated manually which gave us results but with less precise, as it was a manual process to substitute scraped data to that formulas, In this research we’ve made results with more precision by performing a String comparison algorithm named, “Levenshtein Distance Algorithm” which is deployed in UiPath.

Highlights

Robotics and automation has stepped into reality a few years ago and is evolving so rapidly around the world in areas such as industrial automation, space engineering, stellar space engineering, even in urban and rural areas all over the world
Microsoft OCR, which is a built in OCR engine in Microsoft windows 10 and Tesseract OCR,[2]an open source OCR engine developed www.rspsciencehub.com by Google were the two available open source OCR engines in UiPath, a tool for Robotic Process Automation
In the previous paper[1]research made is by checking the accuracy of Tesseract OCR and Microsoft OCR by using some manual methods, which is not precise

Summary

Introduction

Robotics and automation has stepped into reality a few years ago and is evolving so rapidly around the world in areas such as industrial automation, space engineering, stellar space engineering, even in urban and rural areas all over the world. 3. Methodology In this research proposal, an string comparison algorithm plays an important role to give more accurate results than our previous study, the whole sequence of the execution will be like; unzipping and feeding the data from our local machine storage to the workflow ocr engines(either Microsoft OCR engine or Tesseract OCR Engine first) ; and redirecting the extracted data to the string algorithm analysing container as show at figure 2.1 and there the major part plays on comparing the extracted data with the original data from the images which was used to feed the OCR engines, and eventually the accuracy(data) will be saved to local storage or can use cloud storage for purpose, if we’re deploying this workflow to the UiPath Orchestrator. The main upgrades from the previous paper[1] is about: Same set of source data is supplied to both ocr machines to expect better comparison results Both workflows for ocr has been executed with same hardware equipments whereas used different hardwares for previous comparison[1] Images with lighten backgrounds are used to extract more data to obtain more precise information. 3.1 What does container refer to? Images with fancy or decorable fonts are used to test the algorithm [1]Containers or Blocks which is often used in uipath studio to classify set of activities or program in an order to execute in a sequential manner, if a container is set to top level node, the activities which is under that container executes first and the workflow further moves to container which holds set of instructions readily to run followed by the previous block

Architecture

Workflow

10. References