Generative Adversarial Network-Based Neural Audio Caption Model for Oral Evaluation

Liu Zhang,Hanyi Zhang,Qing Liu,Chao Shu,Cheng Xie,Jin Guo

doi:10.3390/electronics9030424

Abstract

Oral evaluation is one of the most critical processes in children’s language learning. Traditionally, the Scoring Rubric is widely used in oral evaluation for providing a ranking score by assessing word accuracy, phoneme accuracy, fluency, and accent position of a tester. In recent years, by the emerging demands of the market, oral evaluation requires not only providing a single score from pronunciation but also in-depth, meaning comments based on content, context, logic, and understanding. However, the Scoring Rubric requires massive human work (oral evaluation experts) to provide such deep meaning comments. It is considered uneconomical and inefficient in the current market. Therefore, this paper proposes an automated expert comment generation approach for oral evaluation. The approach first extracts the oral features from the children’s audio as well as the text features from the corresponding expert comments. Then, a Gated Recurrent Unit (GRU) is applied to encode the oral features into the model. Afterwards, a Long Short-Term Memory (LSTM) model is applied to train the mappings between oral features and text features and generate expert comments for the new coming oral audio. Finally, a Generative Adversarial Network (GAN) is combined to improve the quality of the generated comments. It generates pseudo-comments to train the discriminator to recognize the human-like comments. The proposed approach is evaluated in a real-world audio dataset (children oral audio) collected by our collaborative company. The proposed approach is also integrated into a commercial application to generate expert comments for children’s oral evaluation. The experimental results and the lessons learned from real-world applications show that the proposed approach is effective for providing meaningful comments for oral evaluation.

Highlights

Oral evaluation is a language-testing process, which includes pronunciation accuracy, fluency, integrity, logical ability, understanding ability and so on
With the development of deep learning, researchers have proposed a large number of acoustic model (AM) methods based on deep neural networks in speech recognition, which is generally divided into hybrid acoustic models and end-to-end acoustic models
Generative Adversarial Network (GAN)-Based Neural Audio Caption Model is composed of two neural networks, a generative neural network and a discriminative neural network

Summary

Introduction

Oral evaluation is a language-testing process, which includes pronunciation accuracy, fluency, integrity, logical ability, understanding ability and so on. Our previous work had tried to apply the caption generation model to generate expert comment for the oral evaluation [11]. A Neural Audio Caption Model (NACM) is proposed to generate expert comments from the oral audio. Compared with the previous work, GNACM can produce more accurate and complete expert comment for the oral evaluation. We propose a model called NACM that can generate expert comment for the oral audio. Based on NACM, we propose an improved model called GNACM that can generate more accurate and complete expert comment for the oral audio. Beyond the Scoring Rubric approach, the work is the early try to generate expert comments for the oral evaluation

Related Work

Caption Generation Model

Audio Feature Extraction Model

Text Generation Model Based on Deep Learning

The Approach

Audio Feature Extraction

Text Preprocessing

Neural Audio Caption Model

Encoder

Decoder

Generative Adversarial Network-Based Neural Audio Caption Model

Discriminator

Generator

Case Study

Scenario

Dataset

Performance Testing

Evaluation Metrics

Evaluation Results

Application

Baseline System Based on NACM

GNACM for Children Oral Evaluation

Lesson Learned

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Mar 3, 2020
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Generative Adversarial Network-Based Neural Audio Caption Model for Oral Evaluation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Comparison of Hybrid Recurrent Neural Networks for Univariate Time Series Forecasting
Anibal Flores ... Hugo Tito
-
Anibal Flores, et. al.Anibal Flores ... Hugo Tito
25 Aug 2020
25 Aug 2020

Analysis of Gradient Vanishing of RNNs and Performance Comparison
Seol-Hyun Noh
Information | VOL. 12
Seol-Hyun NohSeol-Hyun Noh
25 Oct 2021
Information | VOL. 12

Improving wind speed forecasting at Adama wind farm II in Ethiopia through deep learning algorithms
Mesfin Diro Chaka ... Natei Ermias Benti
Case studies in chemical and environmental engineering | VOL. 9
Mesfin Diro Chaka, et. al.Mesfin Diro Chaka ... Natei Ermias Benti
29 Dec 2023
Case studies in chemical and environmental engineering | VOL. 9

Comparison Between Random Forest and Recurrent Neural Network for Photovoltaic Power Forecasting
Ramek Kim ... Johng-Hwa Ahn
Journal of Korean Society of Environmental Engineers | VOL. 43
Ramek Kim, et. al.Ramek Kim ... Johng-Hwa Ahn
31 May 2021
Journal of Korean Society of Environmental Engineers | VOL. 43

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generative Adversarial Network-Based Neural Audio Caption Model for Oral Evaluation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics