Abstract

Near-data in-memory processing research has been gaining momentum in recent years. Typical processing-in-memory architecture places a single or several processing elements next to a volatile memory, enabling processing without transferring data to the host CPU. The increased bandwidth to and from volatile memory leads to performance gain. However processing-in-memory does not alleviate von Neumann bottleneck for big data problems, where datasets are too large to fit in main memory. We present a novel processing-in-storage system based on Resistive Content Addressable Memory ReCAM. It functions simultaneously as a mass storage and as a massively parallel associative processor. ReCAM processing-in-storage resolves the bandwidth wall by keeping computation inside the storage arrays, without transferring it up the memory hierarchy. We show that ReCAM based processing-in-storage architecture may outperform existing processing-in-memory and accelerator based designs. ReCAM processing-in-storage implementation of Smith-Waterman DNA sequence alignment reaches a speedup of almost five over a GPU cluster. An implementation of in-storage inline data deduplication is presented and shown to achieve orders of magnitude higher throughput than traditional CPU and DRAM based systems.

Highlights

  • Until the breakdown of Dennard scaling designers focused on improving performance of a single core by increasing instruction level parallelism

  • Resistive Content addressable memory (CAM) (ReCAM), a storage device based on emerging resistive materials in the bitcell with a novel non-von Neumann Processing-in-Storage (PRinS) compute paradigm, is proposed in order to mitigate the storage bandwidth bottleneck of big data processing

  • The parallel compare and parallel write operations supported by CAM are used to implement an “if condition, value” expression

Read more

Summary

Introduction

Until the breakdown of Dennard scaling designers focused on improving performance of a single core by increasing instruction level parallelism. Memory bandwidth does not improve at the same rate, making von Neumann bottleneck one of the main performance limiting factors. The problem worsens in datacenter cloud environment, where datasets are distributed among multiple nodes across the datacenter In such case, data transfer adds latency and reduces bandwidth even further, lowering the performance upper bound. Data transfer adds latency and reduces bandwidth even further, lowering the performance upper bound This challenge has motivated renewed interest in Near-Data Processing (NDP) [7]. Resistive CAM (ReCAM), a storage device based on emerging resistive materials in the bitcell with a novel non-von Neumann Processing-in-Storage (PRinS) compute paradigm, is proposed in order to mitigate the storage bandwidth bottleneck of big data processing. PRinS implementations of two algorithms are presented in Sections 3 and 4 and compared to other approaches: Smith-Waterman DNA sequence alignment and instorage data deduplication

Background and Related Work
Content Addressable Memory and Associative Processing
Resistive Memories
Related Work
Processing-in-Memory with Resistive Materials
Near-Data Processing-in-Storage
Processing-in-Storage with ReCAM
ReCAM Crossbar Array
System Architecture
PRinS Application
Simulation and Comparison to State-of-the-art
In-Storage ReCAM-Based Deduplication
ReCAM-Based Deduplication Algorithm
In-Storage Deduplication Evaluations
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.