Evaluation of Automatic Speech Recognition Approaches

Regis Pires Magalhães,Daniel Jean Rodrigues Vasconcelos,Matheus Xavier Sampaio,Guilherme Sales Fernandes,Ticiana Linhares Coelho Da Silva,Lívia Almada Cruz,José Antônio Fernandes De Macêdo

doi:10.5753/jidm.2022.2514

Abstract

Automatic Speech Recognition (ASR) is essential for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and chatbots. Due to the increasing popularity of these applications and the advances in deep learning models for transcribing speech into text, this work aims to evaluate the performance of commercial solutions for ASR that use deep learning models, such as Facebook Wit.ai, Microsoft Azure Speech, Google Cloud Speech-to-Text, Wav2Vec, and AWS Transcribe. We performed the experiments with two real and public datasets, the Mozilla Common Voice and the Voxforge. The results demonstrate that the evaluated solutions slightly differ. However, Facebook Wit.ai outperforms the other analyzed approaches for the quality metrics collected like WER, BLEU, and METEOR. We also experiment to fine-tune Jasper Neural Network for ASR with four datasets different with no intersection to the ones we collect the quality metrics. We study the performance of the Jasper model for the two public datasets, comparing its results with the other pre-trained models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evaluation of Automatic Speech Recognition Approaches

Abstract

Talk to us

Similar Papers

More From: Journal of Information and Data Management

Lead the way for us

Similar Papers

Evaluation of Automatic Speech Recognition Systems
Matheus Xavier Sampaio ... José Antônio Fernandes De Macêdo
-
Matheus Xavier Sampaio, et. al.Matheus Xavier Sampaio ... José Antônio Fernandes De Macêdo
04 Oct 2021
04 Oct 2021

Pretrained domain-specific language model for natural language processing tasks in the AEC domain
Zhe Zheng ... Jia-Rui Lin
Computers in Industry | VOL. 142
Zhe Zheng, et. al.Zhe Zheng ... Jia-Rui Lin
21 Jun 2022
Computers in Industry | VOL. 142

Denoising of Optical Coherence Tomography Images in Ophthalmology Using Deep Learning: A Systematic Review.
Hanya Ahmed ... Robert Donnan
Journal of Imaging | VOL. 10
Hanya Ahmed, et. al.Hanya Ahmed ... Robert Donnan
01 Apr 2024
Journal of Imaging | VOL. 10

Transfer learning for auto-segmentation of 17 organs-at-risk in the head and neck: Bridging the gap between institutional and public datasets.
Brett Clark ... Nicholas Hardcastle
Medical physics | VOL. 51
Brett Clark, et. al.Brett Clark ... Nicholas Hardcastle
20 Feb 2024
Medical physics | VOL. 51

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluation of Automatic Speech Recognition Approaches

Abstract

Talk to us

Similar Papers

More From: Journal of Information and Data Management