Abstract

There are 7.5 billion inhabitants and over 7,117 languages existing around the world, but only 20% of the people speak English. To understand the wisdom and knowledge of other cultures language translation becomes a basic need. In this paper, a computer-assisted document parsing tool is investigated. The proposed approach uses a language translator that performs translation from images eliminating the need of a human translator for images avoiding the scope for misinterpretation and misunderstanding among people of different ethnic groups. The proposed tool is also capable of performing web crawling using Django Representational State Transfer framework. Further, the proposed approach employs Python packages such as pytesseract, textblob and beautifulsoup to perform Optical Character Recognition, Translation and Extraction of Hypertext Markup Language data respectively. Experimental results of translation on four different categories of images such as Maps, Comics, Newspapers and Magazines, Scientific Publications demonstrate an accuracy of 97.2%, 93.3%, 95.82% and 98.27% respectively. By considering websites like E-commerce, Magazines, Blogs, Social Media, News and Educational sites average precision of 5.4, recall of 7.45 and F-score of 6.24 is achieved. The results reveal that the proposed system can be used as an improvement over a human translator and a data entry operator.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.