Low Cost Automatic Document Scan Machine Using Raspberry Pi and Tesseract Ocr

Prateek Luthra,Irshad Ahmad Ansari,Praveen Chauhan

doi:10.2139/ssrn.3575389

Abstract

Data digitalization is the need of the present time as it makes data storage and processing quite convenient and fast. Presently, Libraries around the world are having a lot of precious knowledge in printed form, which needs to be converted in the digital domain for easy processing. An appropriate mechanical model along with opti-cal character recognition (OCR) ability could be a possible solution to this. In this work, a low cost automatic document scan machine has been proposed, which uses a raspberry pi module to turn the pages and captures the images containing text with the help of a camera. Generally, scanners available in the market are quite costly and lim-ited to scan the same page size. An automatic approach to scan varying length pages and GSMs (Grams per square meter) with an advantage of low cost can be very handy. The flexibility of changing positions of the sensor, camera, and motors helps the pro-posed device to do so. Image pre-processing is performed before feeding the scanned images into tesseract OCR to improve the accuracy of recognition.

Full Text