From Processing-in-Memory to Processing-in-Storage

Kaplan ,Yavits ,Ginosar

doi:10.14529/jsfi170307

Abstract

Near-data in-memory processing research has been gaining momentum in recent years. Typical processing-in-memory architecture places a single or several processing elements next to a volatile memory, enabling processing without transferring data to the host CPU. The increased bandwidth to and from volatile memory leads to performance gain. However processing-in-memory does not alleviate von Neumann bottleneck for big data problems, where datasets are too large to fit in main memory. We present a novel processing-in-storage system based on Resistive Content Addressable Memory ReCAM. It functions simultaneously as a mass storage and as a massively parallel associative processor. ReCAM processing-in-storage resolves the bandwidth wall by keeping computation inside the storage arrays, without transferring it up the memory hierarchy. We show that ReCAM based processing-in-storage architecture may outperform existing processing-in-memory and accelerator based designs. ReCAM processing-in-storage implementation of Smith-Waterman DNA sequence alignment reaches a speedup of almost five over a GPU cluster. An implementation of in-storage inline data deduplication is presented and shown to achieve orders of magnitude higher throughput than traditional CPU and DRAM based systems.

Highlights

Until the breakdown of Dennard scaling designers focused on improving performance of a single core by increasing instruction level parallelism
Resistive Content addressable memory (CAM) (ReCAM), a storage device based on emerging resistive materials in the bitcell with a novel non-von Neumann Processing-in-Storage (PRinS) compute paradigm, is proposed in order to mitigate the storage bandwidth bottleneck of big data processing
The parallel compare and parallel write operations supported by CAM are used to implement an “if condition, value” expression

Summary

Introduction

Until the breakdown of Dennard scaling designers focused on improving performance of a single core by increasing instruction level parallelism. Memory bandwidth does not improve at the same rate, making von Neumann bottleneck one of the main performance limiting factors. The problem worsens in datacenter cloud environment, where datasets are distributed among multiple nodes across the datacenter In such case, data transfer adds latency and reduces bandwidth even further, lowering the performance upper bound. Data transfer adds latency and reduces bandwidth even further, lowering the performance upper bound This challenge has motivated renewed interest in Near-Data Processing (NDP) [7]. Resistive CAM (ReCAM), a storage device based on emerging resistive materials in the bitcell with a novel non-von Neumann Processing-in-Storage (PRinS) compute paradigm, is proposed in order to mitigate the storage bandwidth bottleneck of big data processing. PRinS implementations of two algorithms are presented in Sections 3 and 4 and compared to other approaches: Smith-Waterman DNA sequence alignment and instorage data deduplication

Background and Related Work

Content Addressable Memory and Associative Processing

Resistive Memories

Related Work

Processing-in-Memory with Resistive Materials

Near-Data Processing-in-Storage

Processing-in-Storage with ReCAM

ReCAM Crossbar Array

System Architecture

PRinS Application

Simulation and Comparison to State-of-the-art

In-Storage ReCAM-Based Deduplication

ReCAM-Based Deduplication Algorithm

In-Storage Deduplication Evaluations

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Supercomputing Frontiers and Innovations	Publication Date: Sep 1, 2017
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

From Processing-in-Memory to Processing-in-Storage

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Supercomputing Frontiers and Innovations

Lead the way for us

Similar Papers

Directions for memory hierarchies and their components: research and development
A.J Smith
-
A.J SmithA.J Smith
01 Oct 1978
01 Oct 1978

Beyond the Wall
Sam Likun Xi ... Manos Athanassoulis
-
Sam Likun Xi, et. al.Sam Likun Xi ... Manos Athanassoulis
31 May 2015
31 May 2015

COMIC++: A software SVM system for heterogeneous multicore accelerator clusters
Jaejin Lee ... Zehra Sura
-
Jaejin Lee, et. al. Jaejin Lee ... Zehra Sura
01 Jan 2009
01 Jan 2009

Nearest data processing in GPU
Hossein Bitalebi ... Masoumeh Ebrahimi
Sustainable Computing: Informatics and Systems | VOL. 44
Hossein Bitalebi, et. al.Hossein Bitalebi ... Masoumeh Ebrahimi
28 Oct 2024
Sustainable Computing: Informatics and Systems | VOL. 44

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

From Processing-in-Memory to Processing-in-Storage

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Supercomputing Frontiers and Innovations