Abstract

Large-scale parallel implementation of matrix multiply and accumulate (MAC) core poses significant energy and area constraints in analog voltage domain under reduced supply voltage. A spatial multi-bit sub-1-V time-domain matrix multiplier interface is presented using multi-bit back-gate-driven delay elements as a scalable alternative for various approximate computing applications. A single-chip solution is demonstrated for two application modes: a high-throughput digitally driven mode for acceleration and a low-energy analog front-end mode for sensing. In accelerate mode, the system achieves an aggregate throughput of 21.6 GMAC/s with 9 TOPS/W energy efficiency. In sense mode, the system exhibits an energy efficiency of 55.3 TOPS/W for classification purpose. The proposed architecture utilizes 16-parallel 6-bit input vectors to perform matrix MAC computations using time-domain signal processing with 3-bit resistive weights at a sub-1-V supply of 0.7 V. An integrated speculative time-to-digital converter (is employed for 6-bit time-domain quantization with an on-chip mismatch calibration scheme. The prototype is fabricated in 65-nm CMOS technology and occupies an active area of 0.04 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . The system performs image recognition of handwritten digits using a machine learning scheme and demonstrates an average classification accuracy of 84.3% on the MNIST dataset. The resultant energy per MAC computation in the proposed spatial architecture is about 15× lower than a digital CMOS combinational logic-based parallel-tree MAC.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call