Systematic review of training methods for conversational systems: the potential of datasets validated with user experience

Carolina Abrantes,Juliana Camargo,José Nunes,Maria João Antunes,Luís Nóbrega,Óscar Mealha

doi:10.15847/obsobs17220232263

Carolina Abrantes, Juliana Camargo + Show 4 more

Open Access

https://doi.org/10.15847/obsobs17220232263

Copy DOI

Journal: Observatorio (OBS*)	Publication Date: Jun 14, 2023
Citations: 1	License type: CC BY-NC 2.0

Affiliation: University of Aveiro

Abstract

The increasing maturity of artificial intelligence technologies such as Machine Learning algorithms, Natural Language Processing (NLP), Automatic Speech Recognition (ASR) and Natural Language generation are changing the way users interact with technology. Specifically, as voice interactions are becoming commonplace, it is important to understand how such systems are being trained. This systematic review investigates how human data is collected for training conversational agents, with specific interest on data sets directly obtained from human participation in real contexts of need and use. The work reported in this article was supported by PRISMA guidelines and search procedures were led in Scopus, Web of Science and ProQuest, in English and within the last 15-years (2005-2020), with pre-defined criteria to get a detailed holistic perspective of practices published until July 2020. From both search iterations, a total of 22 papers were considered for this review. The main contributions from these papers reveal a common use of learning from demonstration/observation and crowdsourcing methods, in system training and dataset cataloguing, alongside handwriting and sentence labelling and Wizard-of-Oz based studies.

Full Text