Segmentation for MRC compression

Eri Haneda,Jonghyon Yi,Charles A Bouman

doi:10.1117/12.711692

Eri Haneda, Jonghyon Yi + Show 1 more

https://doi.org/10.1117/12.711692

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Mixed Raster Content (MRC) is a standard for efficient document compression which can dramatically improve the compression/quality tradeoff as compared to traditional lossy image compression algorithms. The key to MRC's performance is the separation of the document into foreground and background layers, represented as a binary mask. Typically, the foreground layer contains text colors, the background layer contains images and graphics, and the binary mask layer represents fine detail of text fonts. The resulting quality and compression ratio of a MRC document encoder is highly dependent on the segmentation algorithm used to compute the binary mask. In this paper, we propose a novel segmentation method based on the MRC standards (ITU-T T.44). The algorithm consists of two components: Cost Optimized Segmentation (COS) and Connected Component Classification (CCC). The COS algorithm is a blockwise segmentation algorithm formulated in a global cost optimization framework, while CCC is based on feature vector classification of connected components. In the experimental results, we show that the new algorithm achieves the same accuracy of text detection but with lower false detection of non-text features, as compared to state-of-the-art commercial MRC products. This results in high quality MRC encoded documents with fewer non-text artifacts, and lower bit rate.

Full Text