Abstract

Robust speech processing for single-stream audio data has achieved significant progress in the last decade. However, multi-stream speech processing poses new challenges not present in single-stream data. The peer-led team learning (PLTL) is a teaching paradigm popular among US universities for undergraduate education in STEM courses. In collaboration with UTDallas Student Success Center, we collected CRSS-PLTL and CRSS-PLTL-II corpora for assessment of speech communications in PLTL sessions. Both corpora consist of longitudinal recordings of five teams studying undergraduate Chemistry and Calculus courses consisting of 300 hours of speech data. The multi-stream audio data has unique challenges: (i) time-synchronization; (ii) multi-stream speech processing for speech activity detection, speaker diarization and linking, speech recognition, and (iii) behavioral informatics. We used a 1 kHz tone at the start and end of each session for time-synchronization of multi-stream audio. We leveraged auto-encoder neural network for combining MFCC features from multiple streams into compact bottleneck features. After diarization, each speaker segment is analyzed for behavioral metrics such as (i) dominance; (ii) curiosity in terms of question inflections; (iii) speech rate; (iv) cohesion; and (v) turn-duration and turn-taking patterns. Results are presented on individual and team based conversational interactions. This research suggests new emerging opportunities for wearable speech systems in education research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call