Abstract

This paper proposes to use acoustic features employing deep neural network (DNN) and convolutional neural network (CNN) models for classifying video lectures in a massive open online course (MOOC). The models exploit the voice pattern of the lecturer for identification and for classifying the video lecture according to the right speaker category. Filter bank and Mel frequency cepstral coefficient (MFCC) feature along with first and second order derivatives (Δ/ΔΔ) are used as input features to the proposed models. These features are extracted from the speech signal which is obtained from the video lectures by separating the audio from the video using FFmpeg. The deep learning models are evaluated using precision, recall, and F1 score and the obtained accuracy is compared for both acoustic features with traditional machine learning classifiers for speaker identification. A significant improvement of 3% to 7% classification accuracy is achieved over the DNN and twice to that of shallow machine learning classifiers for 2D-CNN with MFCC. The proposed 2D-CNN model with an F1 score of 85.71% for text-independent speaker identification makes it plausible to use speaker ID as a classification approach for organizing video lectures automatically in a MOOC setting.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.