Abstract

With the rapid penetration of the Internet across the globe and increased bandwidth, the rise of audio data is significant. In one of the recent surveys on the Internet, it is mentioned that video and audio streaming will gobble up more than 50% of the Internet traffic. Moreover, with the recent rise of voice assistants, the significance to the audio data, especially voice data is at the zenith. In this background, there is a need to analyze the audio data for gathering significant insights which will have larger implications in the domains of health, marketing and media. In this project, an open-source approach is proposed to analyze the audio data using the acoustic features like Mel frequency cepstral coefficients (MFCCs), unlike converting the audio to text and performing the analysis on the converted textual data. In this work, a convolutional neural network (CNN) model is developed to predict the emotions on the given audio data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call