Large Random Access Memory Research Articles

This paper proposes a novel algorithm to compute the 2-D discrete wavelet transform (DWT) of high-resolution (HR) images on low-cost visual sensor and Internet of Things (IoT) nodes. The main advantages of the proposed segmented modified fractional wavelet filter (SMFrWF) are reduced computation (time) complexity and energy consumption compared to the state-of-the-art low-memory 2-D DWT computation methods. In particular, the conventional convolution-based DWT is very fast but requires large random access memory (RAM), as the entire image needs to be in the system memory. The fractional wavelet filter (FrWF) requires only a small RAM but has high complexity due to multiple readings of image lines. The proposed SMFrWF avoids the multiple readings of image lines, thus reducing the memory read access time and, thereby, the complexity. We evaluated the proposed SMFrWF through MATLAB simulations with 70 popular gray-scale test images of dimensions ranging from $256 \times 256$ up to $8192 \times 8192$ pixels. The results show that for images of size $2048\times 2048$ pixels, the proposed SMFrWF (with four segments per line) has 16.8% and 53.6% lower time complexities than the conventional DWT and FrWF, respectively. The proposed SMFrWF has also been modeled in a hardware description language (HDL) and implemented on an Artix-7 field-programmable gate array (FPGA) platform to evaluate the hardware performance. We observed that the proposed SMFrWF has 65% lower energy consumption than the FrWF (both implemented on the same board). Thus, the proposed SMFrWF appears suitable for computing the wavelet transform coefficients of HR images on low-cost visual sensors and IoT platforms.

Read full abstract

Summary With the advent of the multicore central-processing unit (CPU), today's commodity PC clusters are effectively a collection of interconnected parallel computers, each with multiple multicore CPUs and large shared random access memory (RAM), connected together by means of high-speed networks. Each computer, referred to as a compute node, is a powerful parallel computer on its own. Each compute node can be equipped further with acceleration devices such as the general-purpose graphical processing unit (GPGPU) to further speed up computational-intensive portions of the simulator. Reservoir-simulation methods that can exploit this heterogeneous hardware system can be used to solve very-large-scale reservoir-simulation models and run significantly faster than conventional simulators. Because typical PC clusters are essentially distributed share-memory computers, this suggests that the use of the mixed-paradigm parallelism (distributed-shared memory), such as message-passing interface and open multiprocessing (MPI-OMP), should work well for computational efficiency and memory use. In this work, we compare and contrast the single-paradigm programming models, MPI or OMP, with the mixed paradigm, MPI-OMP, programming model for a class of solver method that is suited for the different modes of parallelism. The results showed that the distributed memory (MPI-only) model has superior multicompute-node scalability, whereas the shared memory (OMP-only) model has superior parallel performance on a single compute node. The mixed MPI-OMP model and OMP-only model are more memory-efficient for the multicore architecture than the MPI-only model because they require less or no halo-cell storage for the subdomains. To exploit the fine-grain shared memory parallelism available on the GPGPU architecture, algorithms should be suited to the single-instruction multiple-data (SIMD) parallelism, and any recursive operations are serialized. In addition, solver methods and data store need to be reworked to coalesce memory access and to avoid shared memory-bank conflicts. Wherever possible, the cost of data transfer through the peripheral component interconnect express (PCIe) bus between the CPU and GPGPU needs to be hidden by means of asynchronous communication. We applied multiparadigm parallelism to accelerate compositional reservoir simulation on a GPGPU-equipped PC cluster. On a dual-CPU-dual-GPGPU compute node, the parallelized solver running on the dual-GPGPU Fermi M2090Q achieved up to 19 times speedup over the serial CPU (1-core) results and up to 3.7 times speedup over the parallel dual-CPU X5675 results in a mixed MPI + OMP paradigm for a 1.728-million-cell compositional model. Parallel performance shows a strong dependency on the subdomain sizes. Parallel CPU solve has a higher performance for smaller domain partitions, whereas GPGPU solve requires large partitions for each chip for good parallel performance. This is related to improved cache efficiency on the CPU for small subdomains and the loading requirement for massive parallelism on the GPGPU. Therefore, for a given model, the multinode parallel performance decreases for the GPGPU relative to the CPU as the model is further subdivided into smaller subdomains to be solved on more compute nodes. To illustrate this, a modified SPE5 (Killough and Kossack 1987) model with various grid dimensions was run to generate comparative results. Parallel performances for three field compositional models of various sizes and dimensions are included to further elucidate and contrast CPU-GPGPU single-node and multiple-node performances. A PC cluster with the Tesla M2070Q GPGPU and the 6-core Xeon X5675 Westmere was used to produce the majority of the reported results. Another PC cluster with the Tesla M2090Q GPGPU was available for some cases, and the results are reported for the modified SPE5 (Killough and Kossack 1987) problems for comparison.

Read full abstract

Large Random Access Memory Research Articles

Related Topics

Articles published on Large Random Access Memory

Processing Next-Generation Mass Spectrometry Imaging Data: Principal Component Analysis at Scale.

Ultra8T: A sub-threshold 8T SRAM with leakage detection

Effect of temperature on structural, dynamical, and electronic properties of Sc2Te3 from first-principles calculations.

Validation of android-based mobile application for retrieving network signal level

3-D TCAD Study of the Implications of Channel Width and Interface States on FD-SOI Z2-FETs

Proficient Static RAM design using Sleepy Keeper Leakage Control Transistor & PT-Decoder for handheld application

SMFrWF: Segmented Modified Fractional Wavelet Filter: Fast Low-Memory Discrete Wavelet Transform (DWT)

Design of CMOS Compatible, High‐Speed, Highly‐Stable Complementary Switching with Multilevel Operation in 3D Vertically Stacked Novel HfO2/Al2O3/TiOx (HAT) RRAM

A Novel Architecture of Large Hybrid Cache With Reduced Energy

Statistical analysis of the correlations between cell performance and its initial states in contact resistive random access memory cells

A Novel Read Scheme for Large Size One-Resistor Resistive Random Access Memory Array

Multiparadigm Parallel Acceleration for Reservoir Simulation

E-Passport Threats

Estimation of thermal durability and intrinsic critical currents of magnetization switching for spin-transfer based magnetic random access memory

(Ba,Sr)TiO 3 thin films for ultra large scale dynamic random access memory. : A review on the process integration

A fast algorithm for computing minimum cross-entropy positive time-frequency distributions

Optoelectronic-cache memory system architecture

Estimation of the LET threshold of single event upset of microelectronics in experiments with Cf-252

New approaches for the repairs of memories with redundancy by row/column deletion for yield enhancement

Imaging cytoplasmic free calcium in histamine stimulated endothelial cells and in fMet-Leu-Phe stimulated neutrophils

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Large Random Access Memory Research Articles

Related Topics

Articles published on Large Random Access Memory

Processing Next-Generation Mass Spectrometry Imaging Data: Principal Component Analysis at Scale.

Ultra8T: A sub-threshold 8T SRAM with leakage detection

Effect of temperature on structural, dynamical, and electronic properties of Sc2Te3 from first-principles calculations.

Validation of android-based mobile application for retrieving network signal level

3-D TCAD Study of the Implications of Channel Width and Interface States on FD-SOI Z2-FETs

Proficient Static RAM design using Sleepy Keeper Leakage Control Transistor &amp; PT-Decoder for handheld application

SMFrWF: Segmented Modified Fractional Wavelet Filter: Fast Low-Memory Discrete Wavelet Transform (DWT)

Design of CMOS Compatible, High‐Speed, Highly‐Stable Complementary Switching with Multilevel Operation in 3D Vertically Stacked Novel HfO2/Al2O3/TiOx (HAT) RRAM

A Novel Architecture of Large Hybrid Cache With Reduced Energy

Statistical analysis of the correlations between cell performance and its initial states in contact resistive random access memory cells

A Novel Read Scheme for Large Size One-Resistor Resistive Random Access Memory Array

Multiparadigm Parallel Acceleration for Reservoir Simulation

E-Passport Threats

Estimation of thermal durability and intrinsic critical currents of magnetization switching for spin-transfer based magnetic random access memory

(Ba,Sr)TiO 3 thin films for ultra large scale dynamic random access memory. : A review on the process integration

A fast algorithm for computing minimum cross-entropy positive time-frequency distributions

Optoelectronic-cache memory system architecture

Estimation of the LET threshold of single event upset of microelectronics in experiments with Cf-252

New approaches for the repairs of memories with redundancy by row/column deletion for yield enhancement

Imaging cytoplasmic free calcium in histamine stimulated endothelial cells and in fMet-Leu-Phe stimulated neutrophils

Proficient Static RAM design using Sleepy Keeper Leakage Control Transistor & PT-Decoder for handheld application