Abstract

Client conversations in contact centers are nowadays routinely recorded for a number of reasons—in many cases, just because it is required by current legislation. However, even if not required, conversations between customers and agents can be a valuable source of information about clients or future clients, call center agents, markets trends, etc. Analyzing these recordings provides an excellent opportunity to be aware about the business and its possibilities. The current state of the art in Automatic Speech Recognition (ASR) allows this information to be effectively extracted and used. However, conversations are usually stored in highly compressed ways to save space and typically contain packet losses that produce short interruptions in the speech signal due to the common use of Voice-over-IP (VoIP) in these systems. These effects, and especially the last one, have a negative impact on ASR performance. This article presents an extensive study on the importance of these effects on modern ASR systems and the effectiveness of using several techniques of data augmentation to increase their robustness. In addition, ITU-T G.711, a well-known Packet Loss Concealment (PLC) method is applied in combination with data augmentation techniques to analyze ASR performance improvement on signals affected by packet losses.

Highlights

  • Most of the calls in call centers are recorded, in many cases just because it is mandatory with the current legislation

  • The works that we have found more similar to our approach of applying data augmentation techniques to deal with the problem of packet losses are [31], which presents a study of different training approaches, including data augmentation approaches, to deal with packet losses in a emotion recognition task, and [17], which proposes and evaluates a deep learning based PCL system using Automatic Speech Recognition (ASR) measured in terms of Word Error Rate (WER), and compares results without the PCL system with and without data augmentation in ASR training

  • The main goal is to improve the Word Error Rate (WER) of speech recognition tested on simulated data and real data coming from call centers

Read more

Summary

Introduction

Most of the calls in call centers are recorded, in many cases just because it is mandatory with the current legislation. Besides legal requirements, these recordings constitute a rich source of information about users, the call center operators, the efficiency of the campaigns, and market trends, which can be translated into valuable information on the business. Voice-over-IP (VoIP), the transmission of speech over IP packets, is nowadays mainstream in call centers and their recording systems. VoIP can make use of different speech codecs, and depending on the selected speech codec, the length of the packet used for the transmission of the speech signals can change. The packet length is between 20 and 40 ms

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.