Abstract

Based on word identification and knowledge-based verification, we offer a new technique to automated speech recognition (ASR). Assuming we know the vocabulary, we begin by creating word detectors for each word in the utterance. Pruning techniques are employed to weed out improbable word candidates. These words are then grouped together to form word strings. To construct a bottom-up, detection-based voice recognition system that incorporates knowledge of acoustics, speech, and language into pruning and rescoring, the suggested strategy differs from the typical maximal a posteriori decoding method. Using phone models learned from the TIMIT corpus, the suggested method was tested on a connected digit task. Even though no digit samples were provided to train the detectors and recognizers, the suggested detection-based framework performed well when compared to current linked digit recognition methods. This detection-based technique can incorporate other knowledge-based limitations, such as the manner and location of articulation detectors, to enhance the overall system's robustness and performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call