Speech Synthesis Research Articles

Social skills training by human trainers is a well-established method of teaching appropriate social and communication skills and strengthening social self-efficacy. Specifically, human social skills training is a fundamental approach to teaching and learning the rules of social interaction. However, it is cost-ineffective and offers low accessibility, since the number of professional trainers is limited. A conversational agent is a system that can communicate with a human being in a natural language. We proposed to overcome the limitations of current social skills training with conversational agents. Our system is capable of speech recognition, response selection, and speech synthesis and can also generate nonverbal behaviors. We developed a system that incorporated automated social skills training that completely adheres to the training model of Bellack et al through a conversational agent. This study aimed to validate the training effect of a conversational agent-based social skills training system in members of the general population during a 4-week training session. We compare 2 groups (with and without training) and hypothesize that the trained group's social skills will improve. Furthermore, this study sought to clarify the effect size for future larger-scale evaluations, including a much larger group of different social pathological phenomena. For the experiment, 26 healthy Japanese participants were separated into 2 groups, where we hypothesized that group 1 (system trained) will make greater improvement than group 2 (nontrained). System training was done as a 4-week intervention where the participants visit the examination room every week. Each training session included social skills training with a conversational agent for 3 basic skills. We evaluated the training effect using questionnaires in pre- and posttraining evaluations. In addition to the questionnaires, we conducted a performance test that required the social cognition and expression of participants in new role-play scenarios. Blind ratings by third-party trainers were made by watching recorded role-play videos. A nonparametric Wilcoxson Rank Sum test was performed for each variable. Improvement between pre- and posttraining evaluations was used to compare the 2 groups. Moreover, we compared the statistical significance from the questionnaires and ratings between the 2 groups. Of the 26 recruited participants, 18 completed this experiment: 9 in group 1 and 9 in group 2. Those in group 1 achieved significant improvement in generalized self-efficacy (P=.02; effect size r=0.53). We also found a significant decrease in state anxiety presence (P=.04; r=0.49), measured by the State-Trait Anxiety Inventory (STAI). For ratings by third-party trainers, speech clarity was significantly strengthened in group 1 (P=.03; r=0.30). Our findings reveal the usefulness of the automated social skills training after a 4-week training period. This study confirms a large effect size between groups on generalized self-efficacy, state anxiety presence, and speech clarity.

Subject of the study: the process of acoustic information protection in computer systems of critical applications to ensure the required level of system security. The aim of the article is to analyze the methods of acoustic information protection in computer systems of critical application by means of masking to ensure the impossibility of unauthorized access to the system. The article solves the following tasks: to analyze the software and hardware masking of speech; to study the masking of speech messages in order to introduce unrecognizability; to study the features of speech message compression; to investigate methods of covert transmission of acoustic information. The results of the work, which were obtained using mathematical methods of information transformation in computer systems, are potentially possible methods of masking speech messages to ensure the impossibility of unauthorized access to the system. The analysis of the functioning of the presented methods has led to the following conclusions. One of the perspective directions of acoustic information protection in communication channels and dedicated premises can be considered the creation and development of computerized speech masking systems along with or in conjunction with traditional technologies of semantic protection of acoustic information, namely, speech signal classification based on cryptographic algorithms. The main requirements for today's systems that provide protection of acoustic information in critical computer systems are speed and efficiency of various speech signal processing procedures using standard inexpensive technical means of computer telephony, namely: a personal computer, sound card, telephone line interface device and/or modem. These requirements can be met by applying digital methods of dynamic spectral analysis, i.e. synthesis of speech and audio signals. The choice of specific methods and means of speech masking as one of the types of semantic protection of acoustic information will depend on the practical requirements for the speech protection system and the technical characteristics of the acoustic information transmission channel. Further research is desirable to analyze the possible use of methods for synthesizing large ensembles of quasi-orthogonal discrete signals with improved ensemble, structural and correlation properties to ensure higher security indicators of acoustic channels in computer systems of critical applications.

Speech Synthesis Research Articles

Related Topics

Articles published on Speech Synthesis

Audio Anti-Spoofing Based on Audio Feature Fusion

SeDepTTS: Enhancing the Naturalness via Semantic Dependency and Local Convolution for Text-to-Speech Synthesis

VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing

Avocodo: Generative Adversarial Network for Artifact-Free Vocoder

Towards Voice Reconstruction from EEG during Imagined Speech

What Does Your Face Sound Like? 3D Face Shape towards Voice

A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech

Computational Approaches to Phonology: Advances in Speech Recognition and Synthesis

ADAPT-TTS: HIGH-QUALITY ZERO-SHOT MULTI-SPEAKER TEXT-TO-SPEECH ADAPTIVE-BASED FOR VIETNAMESE

Speech Synthesis in the “Mother Tongue”: Designing, Training, and Evaluating a Text-to-Speech System for Yiddish

LiVoAuth: Liveness Detection in Voiceprint Authentication With Random Challenges and Detection Modes

Real-Time Driver Drowsiness Detection Using Images

Analysis of the survey of voice synthesis technology

1991 to 2019: The rise of machine interpreting research

Semi-Supervised Learning for Robust Emotional Speech Synthesis with Limited Data

Research on a Guide Dog Robot for Expressing Visual Environment by Voice

A Self-Attentional ResNet-LightGBM Model for IoT-Enabled Voice Liveness Detection

Differentiation of the Functional Systems of Speech and Language and the Study of the Differences of the Neural Networks That Support Them

The Validation of Automated Social Skills Training in Members of the General Population Over 4 Weeks: Comparative Study.

ANALYSIS OF ACOUSTIC INFORMATION PROTECTION METHODS IN CRITICAL APPLICATIONS

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Speech Synthesis Research Articles

Related Topics

Articles published on Speech Synthesis

Audio Anti-Spoofing Based on Audio Feature Fusion

SeDepTTS: Enhancing the Naturalness via Semantic Dependency and Local Convolution for Text-to-Speech Synthesis

VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing

Avocodo: Generative Adversarial Network for Artifact-Free Vocoder

Towards Voice Reconstruction from EEG during Imagined Speech

What Does Your Face Sound Like? 3D Face Shape towards Voice

A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech

Computational Approaches to Phonology: Advances in Speech Recognition and Synthesis

ADAPT-TTS: HIGH-QUALITY ZERO-SHOT MULTI-SPEAKER TEXT-TO-SPEECH ADAPTIVE-BASED FOR VIETNAMESE

Speech Synthesis in the “Mother Tongue”: Designing, Training, and Evaluating a Text-to-Speech System for Yiddish

LiVoAuth: Liveness Detection in Voiceprint Authentication With Random Challenges and Detection Modes

Real-Time Driver Drowsiness Detection Using Images

Analysis of the survey of voice synthesis technology

1991 to 2019: The rise of machine interpreting research

Semi-Supervised Learning for Robust Emotional Speech Synthesis with Limited Data

Research on a Guide Dog Robot for Expressing Visual Environment by Voice

A Self-Attentional ResNet-LightGBM Model for IoT-Enabled Voice Liveness Detection

Differentiation of the Functional Systems of Speech and Language and the Study of the Differences of the Neural Networks That Support Them

The Validation of Automated Social Skills Training in Members of the General Population Over 4 Weeks: Comparative Study.

ANALYSIS OF ACOUSTIC INFORMATION PROTECTION METHODS IN CRITICAL APPLICATIONS