Abstract

Speech Recognition is one of the several Artificial Intelligence applications. It helps us converting spoken words into text. It can be part of various daily use cases in order to deal with accessibility. Google Assistant and Amazon's Alexa are in the top of list of the well-known Speech Recognition tools. European companies cannot use these solutions as they should guarantee data sovereignty. Another important point is that these mentioned solutions are not customized. So that, it is not possible to deal with new accents or new vocabularies. To cope with these problems, one can either use European Automatic Speech Recognition (ASR) solutions or build his own personalized models using well-known open-source tools like Deep Speech or Kaldi. Choosing the best solution between both, Kaldi and DeepSpeech, is an important task. The criteria for judging the finest method are the Accuracy and the Inference Time. In this paper, we make theoretical and experimental study between DeepSpeech and Kaldi. Also, Vosk and LinTO, open-source solutions build in top of Kaldi, will be included in the comparison study.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call