Abstract

A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133 MHz.

Highlights

  • Automated Speech Recognition (ASR) systems can revolutionize the way that we interact with technology

  • Over the last 5 years, the research concerning high performance ASR has been more focused on hardware implementations and as such, many FPGA-based speech recognition systems have been implemented, systems have generally been limited by small vocabulary [7, 8] or have relied on custom hardware to provide the necessary resources required for a large vocabulary system [9]

  • The Gaussian calculation calculates probabilities according to the input speech which are passed to the backend search

Read more

Summary

Introduction

Automated Speech Recognition (ASR) systems can revolutionize the way that we interact with technology. The approach of pairing a softcore processor with a custom IP peripheral is popular and has been proposed in a number of papers [8, 10] but a system operating on large vocabularies at real-time is yet to be demonstrated This is, in part, due to the low operating frequencies of softcore processors but another problem is the interfacing with off-chip, high capacity RAM which can introduce large delays that cripple a high bandwidth system like speech recognition. Through profiling of the software system, the Gaussian calculation has been identified as a performance bottleneck For this reason, a custom Gaussian core was developed with a simple interface that allows it to be implemented either as an embedded peripheral in a system-on-chip speech recognition system or as an FPGA hardware accelerator for use with a desktop or server software system. (iv) Both the software proof of concept and the multicore implementation of the Gaussian core have been demonstrated to run faster than real-time.

HMM-Based Continuous Speech Recognition
Software Performance
FPGA Design Decisions
FPGA Implementation
Result
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call