Abstract Speech, as one of the earliest forms of communication used by humans, can effectively convey information. However, the current deep neural network models for speech recognition are generally large in scale and can only be deployed in the cloud, which imposes high deployment environment requirements and power consumption, thereby limiting their implementation on embedded devices. In the context of end-to-end speech recognition, a series of challenges are encountered, including power consumption constraints, computing power limitations, network dependencies, privacy protection, bandwidth restrictions, and communication delays. To address these issues, this paper proposes the design of an end-to-end voice command recognition chip based on deep neural networks specifically for recognizing voice commands in specific scenarios. This chip achieves low power consumption and minimal delay in recognition. Additionally, we introduce a weighted, overloadable chip architecture to enable seamless scene migration, ultimately aiming to resolve the aforementioned challenges.
Read full abstract