Abstract

Speech processing technology has great potential in the medical field to provide beneficial solutions for both patients and doctors. Speech interfaces, represented by speech synthesis and speech recognition, can be used to transcribe medical documents, control medical devices, correct speech and hearing impairments, and assist the visually impaired. However, it is essential to predict prosody phrase boundaries for accurate natural speech synthesis. This study proposes a method to build a reliable learning corpus to train prosody boundary prediction models based on deep learning. In addition, we offer a way to generate a rule-based model that can predict the prosody boundary from the constructed corpus and use the result to train a deep learning-based model. As a result, we have built a coherent corpus, even though many workers have participated in its development. The estimated pairwise agreement of corpus annotations is between 0.7477 and 0.7916 and kappa coefficient (K) between 0.7057 and 0.7569. In addition, the deep learning-based model based on the rules obtained from the corpus showed a prediction accuracy of 78.57% for the three-level prosody phrase boundary, 87.33% for the two-level prosody phrase boundary.

Highlights

  • Speech processing technology has demonstrated great potential to provide beneficial solutions for both patients and doctors in smart healthcare

  • The voice interface represented by speech synthesis and speech recognition can be used to transcribe medical documents, control medical devices, mitigate speech and hearing impairments, and support the visually impaired

  • This study proposes a new methodology for the reliable prediction of prosodic breaks using linguistic knowledge and bi-gram information obtained from a small-scale corpus

Read more

Summary

Introduction

Speech processing technology has demonstrated great potential to provide beneficial solutions for both patients and doctors in smart healthcare. Recent advances in speech processing technology and other advanced technologies, including the Internet of Things (IoT) and communication systems, have significantly advanced contemporary healthcare systems [1,2,3]. The voice interface represented by speech synthesis and speech recognition can be used to transcribe medical documents, control medical devices, mitigate speech and hearing impairments, and support the visually impaired. It can be used as a biomarker in diagnosing psychological disorders. Environmental control assistance (e.g., device control, audio level control, nursing assistance requests, decision-making assistance) can aid in the recovery of patients with reduced mobility [7]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.