Abstract

Image data compression algorithms are essential for getting storage space reduction and, perhaps more importantly, to increase their transfer rates, in terms of space-time complexity. Considering that there isn't any encoder that gives good results across all image types and contents, this paper proposed an evolvable lossless statistical block-based technique for segmentation and compression compound or mixed documents that have different content types, such as pictures, graphics, and/or texts.
 
 Derived from the number of detected colors and to achieve better compression ratios, a new well-defined representation of the image is created which nonetheless retains the same image components. With the effort of reducing noise or other variations inside the scanned image, some primary operations are implemented. Thereafter, the proposed algorithm breaks down the compound document image into equal-size-square blocks. Next, inspired by the number of colors detected in each block, these blocks are categorized into a set of six-image objects, called classes, where each one contains a set of closely interrelated pixels that share the same common relevant attributes like color gamut and number, color occurrence, grey level, and others. After that, a new representation of these coherent classes is formed using the Lookup Dictionary Table (LUD), which is the real essence of this proposed algorithm. In order to form distinguishable labeled regions sharing the same attributes, adjacent blocks of similar color features are consolidated together into a single coherent whole entity, called segments or regions. After each region is encoded by one of the most off-the-shelf applicable compression techniques, these regions are eventually fused together into a single data file which then subjects to another compression stage to ensure better compression ratios. After the proposed algorithm has been applied and tested on a database containing 3151 24-bit-RGB-bitmap document images, the empirically-based results prove that the overall algorithm is efficient in the long run and has superior storage space reduction when compared with other existing algorithms. As for the empirical findings, the proposed algorithm has achieved (71.039 %) relative reduction in the data storage space.

Highlights

  • RGB images, referred to as component images, are the most common model of images

  • An increase in the demand of numerous millions of computer users for storing more numerous millions of images paved the way for viewing segmentation and compression techniques and seeing them as more intertwined than ever

  • The present work proposes a lossless statistical block-based segmentation technique that works in conjunction with other encoding techniques to segment compound or mixed documents that have different content types, such as pictures, graphics, texts, and/or backgrounds

Read more

Summary

Introduction

RGB images, referred to as component images, are the most common model of images. Working at the level of the pixels which make up images, every image has an MxNx3 array of color pixels. This means that the image contains “M” pixels along the horizontal direction, called image width, and “N” pixels along the vertical direction, called image length. The total pixel count is “M” multiplied by “N”, namely “MxN”. The number of bits that are required to address every integer of these three integers defines the bit depth which is referred to as “pixel depth”, “the number of bits per pixel”, or “grey-scale resolution”.

Objectives
Methods
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.