Evaluation of the efficiency of state-of-the-art Speech Recognition engines

Asma Trabelsi,Sébastien Warichet,Yassine Aajaoun,Séverine Soussilane

doi:10.1016/j.procs.2022.09.534

Asma Trabelsi, Sébastien Warichet + Show 2 more

Open Access

https://doi.org/10.1016/j.procs.2022.09.534

Copy DOI

Journal: Procedia Computer Science	Publication Date: Jan 1, 2022
Citations: 8	License type: cc-by-nc-nd

Affiliation: Université de Lorraine

Abstract

Speech Recognition is one of the several Artificial Intelligence applications. It helps us converting spoken words into text. It can be part of various daily use cases in order to deal with accessibility. Google Assistant and Amazon's Alexa are in the top of list of the well-known Speech Recognition tools. European companies cannot use these solutions as they should guarantee data sovereignty. Another important point is that these mentioned solutions are not customized. So that, it is not possible to deal with new accents or new vocabularies. To cope with these problems, one can either use European Automatic Speech Recognition (ASR) solutions or build his own personalized models using well-known open-source tools like Deep Speech or Kaldi. Choosing the best solution between both, Kaldi and DeepSpeech, is an important task. The criteria for judging the finest method are the Accuracy and the Inference Time. In this paper, we make theoretical and experimental study between DeepSpeech and Kaldi. Also, Vosk and LinTO, open-source solutions build in top of Kaldi, will be included in the comparison study.

Full Text