Abstract

Radio is evolving in a changing digital media ecosystem. Audio-on-demand has shaped the landscape of big unstructured audio data available online. In this paper, a framework for knowledge extraction is introduced, to improve discoverability and enrichment of the provided content. A web application for live radio production and streaming is developed. The application offers typical live mixing and broadcasting functionality, while performing real-time annotation as a background process by logging user operation events. For the needs of a typical radio station, a supervised speaker classification model is trained for the recognition of 24 known speakers. The model is based on a convolutional neural network (CNN) architecture. Since not all speakers are known in radio shows, a CNN-based speaker diarization method is also proposed. The trained model is used for the extraction of fixed-size identity d-vectors. Several clustering algorithms are evaluated, having the d-vectors as input. The supervised speaker recognition model for 24 speakers scores an accuracy of 88.34%, while unsupervised speaker diarization scores a maximum accuracy of 87.22%, as tested on an audio file with speech segments from three unknown speakers. The results are considered encouraging regarding the applicability of the proposed methodology.

Highlights

  • In the past decade, there has been a breakthrough concerning the production and distribution of digital content on the web and in social media

  • We have presented a framework that addresses the need for evolution in radio production practice in the evolving digital media ecosystem and the challenges for efficient management of publically available big audio data

  • Radio production has to adapt in the direction of providing richer content, and more discoverable audio-on-demand, to maintain a broader audience and be more appealing to younger audiences

Read more

Summary

Introduction

There has been a breakthrough concerning the production and distribution of digital content on the web and in social media. This outbreak of available data has highlighted the importance of the development of suitable tools and frameworks for editing and management of the created content, aiming at more efficient distribution and consumption. We used the existing audio files as well as normalized versions of them This aims at making recognition performance robust to energy fluctuations, e.g., with the speaker moving farther and closer to the mic

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.