Abstract

Endpoint artificial intelligence (Al) requires high flexibility in Al processes under strict cost and power limitations. This work aims to achieve a chip capable of executing Al processes at low power while periodically switching the context of multiple neural networks (NNs) in a small chip area. Transistors fabricated using a crystalline oxide semiconductor (OS) such as indium-gallium-zinc oxide exhibit an extremely low offstate current. Such transistors have high compatibility with Si CMOS processes and multiple OS transistor layers can be stacked [1]. A normally-off (Noff) CPU using OS memory as FF backup memory to enable power gating (PG) has been reported [2]. A structure where the Noff CPU has a high-efficiency Al accelerator (ACC) could be a candidate for an endpoint Al chip. Nevertheless, the ACC requires large-scale memory to switch between Al processes. Otherwise, it would waste power and time in data rewriting, which makes a power reduction unfeasible. Moreover, the chip must be adapted for another NN by context switching in which not only weight data but also FF data are quickly switched. The challenge is to secure large-scale memory and achieve context switching with low latency. To meet the challenge, ACC memory for NN weight data, FF backup memory, and CPU memory used as instruction and data memory are <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$3\mathsf{D}$</tex> stacked via an OS transistor stacking technique where OS memory in each layer serves as a bank (Fig. 13.1.1). As proof of this concept, a test chip was fabricated through the <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathsf{OS}/\mathsf{OS}/\mathsf{Si}$</tex> process ( <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$130\mathsf{nm}$</tex> Si CMOS and two layers of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$200\mathsf{nm}$</tex> OS). In the system, bank switching of the ACC memory is linked with bank switching of the FF backup memory, and inference of different NNs is switched with low latency and power so that the PG standby time is extended.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.