Abstract

A significant ongoing issue in realistic humanoid robotics (RHRs) is inaccurate speech to mouth synchronisation. Even the most advanced robotic systems cannot authentically emulate the natural movements of the human jaw, lips and tongue during verbal communication. These visual and functional irregularities have the potential to propagate the Uncanny Valley Effect (UVE) and reduce speech understanding in human-robot interaction (HRI). This paper outlines the development and testing of a novel Computer Aided Design (CAD) robotic mouth prototype with buccinator actuators for emulating the fluidic movements of the human mouth. The robotic mouth system incorporates a custom Machine Learning (ML) application that measures the acoustic qualities of speech synthesis (SS) and translates this data into servomotor triangulation for triggering jaw, lip and tongue positions. The objective of this study is to improve current robotic mouth design and provide engineers with a framework for increasing the authenticity, accuracy and communication capabilities of RHRs for HRI. The primary contributions of this study are the engineering of a robotic mouth prototype and the programming of a speech processing application that achieved a 79.4% syllable accuracy, 86.7% lip synchronisation accuracy and 0.1s speech to mouth articulation differential.

Highlights

  • Many scholars consider the creation of an realistic humanoid robotics (RHRs) that is perceptually indistinguishable in appearance and functionality to that of the average human as the apex of mankind’s technological achievements [1] & [2]

  • The survey for the robotic mouth consisted of 22 quantitative questions set to video and audio based on a recent human-robot interaction (HRI) study [32].The participant sample is 50 random individuals recruited from online social media and forums

  • Statistical Package for Social Sciences (SPSS) analytical software, indicated a moderate-low level of coefficients scoring between 0.6-0.8.The standard deviation ranged between 1.66-2.55 suggesting a high level of dispersion and confidence set at 95% indicates a low margin of error and a low variance ranging between S2:2.7 - S2:6.5, these highly variable results are indicative of the inconsistencies in the reliability of human perception in determining human-likeness, shown in table 1 and 2

Read more

Summary

Introduction

Many scholars consider the creation of an RHR that is perceptually indistinguishable in appearance and functionality to that of the average human as the apex of mankind’s technological achievements [1] & [2]. A key failure in RHR design is accurately synthesising the appearance, speech, movement and intelligence of RHRs to function naturally in the real-world [4]. This consideration is significant as the longer the interaction between humans and RHRs, the greater the probability for visual and functional irregularities to materialise and allude the robot’s artificiality [5]. Audio signal dependent lip synchronisation applications have a higher response time, but they are not as precise as text processing methods due to the highly variable sound waves in natural voice output [9]. An advantage of audio-signal processing over text extraction is the ability to implement human speech in place of a SS, which is a common practise in contemporary RHR design

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.