Abstract

With the development of natural language processing and text mining technology, it has become a trend to mine and extract corresponding knowledge from unstructured text. Contrast is two or more corpora composed of texts of different languages or texts of different variants of the same language. Analogical corpora can also be subdivided into monolingual and bilingual/multilingual corpora. The former collects texts with similar content in a similar language environment, while the latter collects texts in different languages with similar content, register and communicative environment, which are mostly used in contrastive linguistics. Optical character recognition (OCR) is now mainly used in document recognition and certificate recognition. Deep learning can improve the application scope of OCR recognition. Text region extraction applied to OCR can enhance. the accuracy of OCR text extraction and improve the accuracy of OCR. This paper studies the feature extraction method of machine translation equivalent pair for OCR recognition based on Chinese English comparable corpus.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.