Rage for the Machine: Image Compression with Low-Cost Random Access for Embedded Applications
We introduce RAGE, an image compression framework that achieves four generally conflicting objectives: 1) good compression for a wide variety of color images, 2) computationally efficient, fast decompression, 3) fast random access of images with pixel-level granularity without the need to decompress the entire image, 4) support for both lossless and lossy compression. To achieve these, we rely on the recent concept of generalized deduplication (GD), which is known to provide efficient lossless (de)compression and fast random access in time-series data, and deliver key expansions suitable for image compression, both lossless and lossy. Using nine different datasets, incl. graphics, logos, natural images, we show that RAGE has similar or better compression ratios to state-of-the-art lossless image compressors, while delivering pixel-level random access capabilities. Tests in an ARM Cortex-M33 platform show seek times between 9.9 and 40.6 ns and average decoding time per pixel between 274 and 1226 ns. Our measurements also show that RAGE's lossy variant, RAGE-Q, outperforms JPEG by several fold in terms of distortion in embedded graphics and has reasonable compression and distortion for natural images.
- Conference Article
1
- 10.1109/cisp-bmei56279.2022.9980037
- Nov 5, 2022
In this work, a Discrete Cosine Transform (DCT) operation circuit based on highly fault-tolerant stochastic computing paradigm is designed and built to realize image compression. In order to generate stochastic bit streams with high efficiency and low cost, the resistance random access memory (RRAM) is used as stochastic number generator by employing its probabilistic switching behavior. The factors affecting the quality of image compression based on stochastic computing are studied. The results are compared with that of traditional binary image compression method, showing that the longer the length of the stochastic bit stream, the more accurate the stochastic number generated by the RRAM, the better the image compression quality. The speed, energy and hardware resources consumed by the circuit and the image compression quality are evaluated. Compared with the traditional method, the area reduces 55% and the energy reduces 46% in the RRAM-based stochastic computing method. The effect of different error flip rates on the image compression quality is studied. The stochastic computing compression is highly error-tolerant compared to traditional binary compression.
- Conference Article
1
- 10.1109/dicta.2011.56
- Dec 1, 2011
In this paper, a method for loss less compression of medical CT images is presented. The method allows separate compression, transmission, and decompression using data segmentation based on the Hounsfield scale. The presented method modifies our previous method by modifying the prediction scheme. The prediction scheme omits the inter-slice prediction to allow random access into the compressed segmented CT slides. The results show that the obtained compression rate is comparable to previous method.
- Conference Article
10
- 10.1109/dcc.2008.47
- Mar 1, 2008
- DCC
Recent advances in compressed data structures have led to the new concept of self-indexing; it is possible to represent a sequence of symbols compressed in a form that enables fast queries on the content of the sequence. This paper studies different analogies of self-indexing on images. First, we show that a key ingredient of many self-indexes for sequences, namely the wavelet tree, can be used to obtain both lossless and lossy compression with random access to pixel values. Second, we show how to use self-indexes for sequences as a black-box to provide self-indexes for images with filtering-type query capabilities. Third, we develop a tailor-made self-index for images by showing how to compress two-dimensional suffix arrays. Experimental results are provided to compare the compressibility to standard compression methods.
- Conference Article
21
- 10.1109/adfsp.1998.685718
- Jun 5, 1998
A VLSI architecture designed to perform real-time image compression using wavelets is described. The two basic modules of the architecture are a 2-D wavelet transform generator and a coder based on the SPIHT algorithm for lossy image compression. A folded architecture is proposed for computing the 2-D wavelet transform. The architecture uses 3 parallel computational units and 2 storage units. The hardware for the SPIHT coder uses 2 content addressable memories and 3 random access memories. The designs are modular and can easily be extended for different levels of wavelet decomposition and filter lengths. The derived architecture has been functionally verified for an 8/spl times/8 image size by simulating its VHDL code using Mentor Graphics.
- Research Article
1961
- 10.1109/83.847830
- Jul 1, 2000
- IEEE Transactions on Image Processing
A new image compression algorithm is proposed, based on independent embedded block coding with optimized truncation of the embedded bit-streams (EBCOT). The algorithm exhibits state-of-the-art compression performance while producing a bit-stream with a rich set of features, including resolution and SNR scalability together with a "random access" property. The algorithm has modest complexity and is suitable for applications involving remote browsing of large compressed images. The algorithm lends itself to explicit optimization with respect to MSE as well as more realistic psychovisual metrics, capable of modeling the spatially varying visual masking phenomenon.
- Conference Article
4
- 10.1117/12.18873
- Jul 1, 1990
- Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
Color 35mm photographic slides are commonly used in dermatology for education, and patient records. An electronic storage and retrieval system for digitized slide images may offer some advantages such as preservation and random access. We have integrated a system based on a personal computer (PC) for digital imaging of 35mm slides that depict dermatologic conditions. Such systems require significant resources to accommodate the large image files involved. Methods to reduce storage requirements and access time through image compression are therefore of interest. This paper contains an evaluation of one such compression method that uses the Hadamard transform implemented on a PC-resident graphics processor. Image quality is assessed by determining the effect of compression on the performance of an image feature recognition task.
- Conference Article
1
- 10.1117/12.512536
- Nov 20, 2003
- Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
JPEG2000 is the new ISO/IEC image compression standard. It is a full coding system targeted for various imaging applications. Besides offering the state-of-the-art in still image compression, it provides new features such as scalability in quality and in resolution, random access and region of interest (ROI) coding. Motion JPEG2000 is an inherited video compression standard based on intra-farme coding using JPEG2000. JPIP (the JPEG2000 Interactive Protocol), is a developing protocol for the access and transmission of JPEG2000 coded data and related metadata in a networked environment. In this paper, we present various applications of JPEG2000, Motion JPEG2000 and JPIP, geared specially towards the wireless mobile environment. We present an Image Surfing system for surfing JPEG2000 images on mobile terminals over a wireless network. We also present a scheme for tracking and coding Regions-Of-Interest (ROI) over a Motion JPEG2000 sequence. Finally, we present a Partial Coding scheme for use in Motion JPEG2000 sequences that gives coding gains for certain types of video sequences.
- Conference Article
- 10.1117/12.937005
- Jan 9, 1984
- Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
Many applications in visual inspection and in robotvision require a very fast interpretation and evaluation of images, so that industrial processes should not be slowed down. In this paper an image computer architecture is proposed, which is optimalized for two classes of typical image processing tasks, required in visual inspection and robotvision.* Time-filtering, motion analysis, stereo-vision, image compression, matching are examples of a first class of image processing tasks. They have in common that they require paral-lel processing of different images, so they need parallel accesses to them as source, reference or parameter matrix. To a second class of image processing tasks belongs feature extraction; in contrast with the preceding, this processing requires only one image. These algorithms often don't scan through the image in a predetermined way. They are often driven by the image data itself, where pixel data are fed back to compute the next addresses (e.g. connectivity analysis). Fast random access to the image memories and an efficient data-to-address feedback are needed then. The basic element of the system is a fast Image Bus (IBUS) which is a reflection of the needs of parallel access and random access processing : four differently programmable data channels, two address busses and data-to-address feedback capabilities. The system itself is modular to allow minimum configurations in a wide variety of industrial visual inspection tasks.
- Research Article
3
- 10.1109/access.2023.3268992
- Jan 1, 2023
- IEEE Access
Visual Odometry (VO) systems are widely used to determine the position and orientation of a robot or camera in an unknown environment. They are deployed on resource-constrained platforms, such as drones and Virtual Reality (VR) or Augmented Reality (AR) headsets. VO systems harnessing modern System-on-Chip (SoCs) with integrated Field Programmable Gate Array (FPGA) have the potential to improve the overall systems performance. This paper explores the FPGA acceleration of sparse VO kernels using High-level Synthesis (HLS) as this kind of VO system has been designed to use with low-power SoCs. We show that both computational and data transfer overheads between the processing cores of the CPU of the SoC and the accelerators on the FPGA need to be optimized to obtain better end-to-end performance. This is a result of the additional data movement incurred when using an FPGA accelerator and also because of the sparse computational nature with predictable or random memory access patterns of the kernels involved. However, state-of-the-art HLS tools are not yet able to perform the required optimizations automatically because they usually assume that the kernels to be accelerated have dense computational patterns with regular memory access. In this paper we propose three, potentially generic, methods to reduce the data transfer between the CPU and the customised hardware kernels on the FPGA; these methods are: (a) approximation based on domain-specific knowledge, (b) image compression, and (c) the use of on-the-fly computation. We present a case study of the use of these methods on SVO, a state-of-the-art sparse VO system with a semi-direct front-end. We demonstrate that our proposed methods can reduce data transfer overhead to achieve better end-to-end performance and that they can be applied not only when using standard Xilinx HLS tools but also with other state-of-the-art HLS tools, such as HeteroFlow. Compared to the baseline performance of the original SVO software on an Arm CPU, our proposed methods assist the HLS and HeteroFlow designs to achieve a speedup of 2.4x and 2.14x, respectively, without noticeable accuracy loss. The HLS and HeteroFlow designs also achieve a 1.85x and 1.89x, respectively, improvement in energy efficiency on the SoC system used. Compared to the SVO software baseline running on the Intel Xeon CPU, our proposed methods assist the HLS and HeteroFlow designs to achieve 8.2x and 8.3x improvement in energy efficiency, respectively.
- Research Article
9
- 10.1109/tmm.2020.3017890
- Sep 23, 2020
- IEEE Transactions on Multimedia
International audience
- Research Article
- 10.1007/s11771-010-0557-6
- Aug 1, 2010
- Journal of Central South University of Technology
To compress screen image sequence in real-time remote and interactive applications, a novel compression method is proposed. The proposed method is named as CABHG. CABHG employs hybrid coding schemes that consist of intra-frame and inter-frame coding modes. The intra-frame coding is a rate-distortion optimized adaptive block size that can be also used for the compression of a single screen image. The inter-frame coding utilizes hierarchical group of pictures (GOP) structure to improve system performance during random accesses and fast-backward scans. Experimental results demonstrate that the proposed CABHG method has approximately 47%–48% higher compression ratio and 46%–53% lower CPU utilization than professional screen image sequence codecs such as TechSmith Ensharpen codec and Sorenson 3 codec. Compared with general video codecs such as H.264 codec, XviD MPEG-4 codec and Apple’s Animation codec, CABHG also shows 87%–88% higher compression ratio and 64%–81% lower CPU utilization than these general video codecs.
- Conference Article
- 10.1117/12.2322257
- May 1, 1992
Printers process large quantity of data when printing. For example, printing on an A3 size (297 mm 420 mm) at 300 dpi resolution requires 17.4 million pixels, and about 66 Mbytes in a 32-bits/pixel-color image composed of Yellow (Y), Magenta (M), Cyan (C) and Black components. Containing such a large capacity of Random Access Memories (RAMs) in a printer causes an increase in both cost and the size of memory circuits. Thus, image compression techniques are examined in this study to cope with these problems. A still-image coding, being standardized by JPEG (Joint Photographic Experts Group)', will presumably be utilized for image communications or image data bases. The JPEG scheme can compress natural images efficiently; it is unsuitable for text or Computer Graphics (CG) images for degradation of restored images. This scheme, therefore, cannot be implemented for printers which require good image quality. We studied codings which are more suitable for printers than the JPEG scheme2. Two criteria were considered to select a coding scheme suitable for printers: i.e., (1) no visible degradation of input printer images; (2) capability of image edition. Especially in terms of criteria (2), a fixed-length coding was adopted; an arbitrary pixel data code can be easily read out of an image memory. Then we implemented an image coding scheme in our new sublimation full color printer. Input image data are compressed by coding before being written into an image memory.
- Conference Article
9
- 10.1117/12.647501
- Jan 15, 2006
- Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
One interesting feature of image compression is support of region of interest (ROI) access, in which an image sequence can be encoded only once and then the decoder can directly extract a subset of the bitstream to reconstruct a chosen ROI of required quality. In this paper, we apply Three-dimensional Subband Block Hierarchical Partitioning (3-D SBHP), a highly scalable wavelet transform based algorithm, for volumetric medical image compression to support ROI access. The codeblock selection method by which random access decoding can be achieved is outlined and the performance empirically investigated. The experimental results show that there are a number of parameters that affect the effectiveness of ROI access, the most important being the size of the ROI size, code-block size, wavelet composition level, number of filter taps and target bit rate. Finally, one possible way to optimize ROI access performance is addressed.
- Conference Article
48
- 10.1145/360276.360350
- Feb 1, 2001
Discrete wavelet transformations (DWT) followed by embedded zerotree encoding is a very efficient technique for image compression \cite{TenLectures, Shapiro, Spiht}. However, the algorithms proposed in literature assume random access to the whole image. This makes the algorithms unsuitable for hardware solutions because of extensive access to external memory. Here, we present an efficient architecture for computing DWT of images, which is based on a partitioned approach for lossy image compression~\cite{Ritter}. The architecture achieves its computational power by using pipelining and taking advantage of the flexible memory configurations available in FPGA's.
- Research Article
5
- 10.1007/s00330-003-1880-1
- Apr 12, 2003
- European radiology
The aim of this study was to assess the performance of Web-based image distribution when multiple personal computers (PCs) are downloading images simultaneously for different server hardware configurations. Using specially developed software, the time-to-display (TTD) of different image types was measured with up to 16 concurrent PCs for various combinations of processor, random access memory (RAM), network connection and image compression. The TTD increased linearly with the number of concurrent PCs but remained under 5 s in most of the cases, even with 16 concurrent PCs. Only with a 10-Mbit/s network connection or with lossy compression were TTDs above 5 s obtained. Two processors instead of one led to a slight and constant improvement of the TTD. Reducing the amount of RAM increased the TTD mainly for computed radiography (CR) images. There was no difference between a 200- and 100-Mbit/s network, but 10 Mbit/s proved significantly worse. When increasing the number of clients lossless compression performed substantially better than lossy. A standard off-the-shelf server provides an appropriate download performance even with 16 concurrent clients. Processor speed and RAM amount are of minor importance, but it is highly recommended to use a 100-Mbit/s network connection and to avoid the application of on-demand lossy compression in a local area network.