Abstract

In this work, we proposed a digital flash computing-in-memory (CIM) architecture using compressed lookup-table multiplier (CLUTM) and low power word-line voltage trimming (LP-WLVT) schemes. The proposed concept is highly compatible to the standard commodity NOR flash memory. Compared to the conventional lookup-table (LUT) multipliers, CLUTM results in 32 times reduction on the area cost in the case of 8-bit multiplication. The LP-WLVT scheme can further reduce the inference power by 14%. The concept is silicon demonstrated in a 55nm 32Mb commercial flash memory, which can perform 8-bit multiply-and-accumulate (MAC) with a throughput of 51.2 GOPs. It provides 1.778ms frame shift when running TC-resnet8 network, which is <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$5 \times$</tex> more efficient than the previous works. The CLUTM-based digital CIM architecture can play an important role to enable commercial flash for highly-efficient AI edge inference.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call