Abstract

With the number of digital videos and digital images increasing tremendously in e-mails and web pages, text extraction from images becomes important more than ever. Born-digital images are generated directly with the computer and the text in the images is important to help the semantic understanding of the images. Although there are many methods proposed over the past years for text extraction from natural scene images, the text detection and extraction from born-digital images remains a challenge. This paper proposes a novel method to segment the text connected components CCs from a born-digital image. Firstly, binarisation is conducted on the given image to get all candidate text CCs based on wavelet theory. Secondly, classification is conducted on the extracted CCs to label text CCs based on conditional random field CRF - a probabilistic graph model that has been widely used in natural language processing. Experimental results show that the proposed method can effectively extract text from the born-digital images.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.