Abstract

Mixed Raster Content (MRC) is a standard for efficient document compression which can dramatically improve the compression/quality tradeoff as compared to traditional lossy image compression algorithms. The key to MRC's performance is the separation of the document into foreground and background layers, represented as a binary mask. Typically, the foreground layer contains text colors, the background layer contains images and graphics, and the binary mask layer represents fine detail of text fonts. The resulting quality and compression ratio of a MRC document encoder is highly dependent on the segmentation algorithm used to compute the binary mask. In this paper, we propose a novel segmentation method based on the MRC standards (ITU-T T.44). The algorithm consists of two components: Cost Optimized Segmentation (COS) and Connected Component Classification (CCC). The COS algorithm is a blockwise segmentation algorithm formulated in a global cost optimization framework, while CCC is based on feature vector classification of connected components. In the experimental results, we show that the new algorithm achieves the same accuracy of text detection but with lower false detection of non-text features, as compared to state-of-the-art commercial MRC products. This results in high quality MRC encoded documents with fewer non-text artifacts, and lower bit rate.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.