Abstract

This paper presents an automatic speech–speaker recognition (ASSR) system implemented in a chip which includes a built-in extraction, recognition, and training (ERT) core. For VLSI design (here, ASSR system), the hardware cost and time complexity are always the important issues which are improved in this proposed design in two levels: 1) algorithmic and 2) architecture. At the algorithm level, a newly binary-halved clustering (BHC) is proposed to achieve low time complexity and low memory requirement. In addition, at the architecture level, a new ERT core is proposed and implemented based on data dependence and reuse mechanism to reduce the time and hardware cost as well. Finally, the chip implementation is synthesized, placed, and routed using TSMC 90-nm technology library. To verify the performance of the proposed BHC method, a case study is performed based on nine speakers. Moreover, the validation of the ASSR system is examined in two parts: 1) speech recognition and 2) speaker recognition. The results show that the proposed system can achieve 93.38% and 87.56% of recognition rates during speech and speaker recognition, respectively. Furthermore, the proposed ASSR chip includes 396k gate counts, and consumes power in 8.74 mW. Such results demonstrate that the performance of the proposed ASSR system is superior to the conventional systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call