An Optimization of Audio Classification and Segmentation using GASOM Algorithm

Dabbabi Karim,Hajji Salah,Cherif Adnen

doi:10.14569/ijacsa.2018.090424

Dabbabi Karim, Hajji Salah + Show 1 more

Open Access

https://doi.org/10.14569/ijacsa.2018.090424

Copy DOI

Abstract

Now-a-days, multimedia content analysis occupies an important place in widely used applications. It may depend on audio segmentation which is one of the many other tools used in this area. In this paper, we present an optimized audio classification and segmentation algorithms that are used to segment a superimposed audio stream according to its content into 10 main audio types: speech, non-speech, silence, male speech, female speech, music, environmental sounds, and music genres, such as classic music, jazz, and electronic music. We have tested the KNN, SVM, and GASOM algorithms on two audio classification systems. In the first audio classification system, the audio stream is discriminated into speech no-speech, pure-speech/silence, male speech/female speech, and music/ environmental sounds. However, in the second audio classification system, the audio stream is segmented into music/speech, pure-speech/silence, male speech/female speech. For pure-speech/silence discrimination, it is performed in the two systems according to a rule-based classifier. Concerning the music segments in both systems, they are discriminated into different music genres using the decision tree as a classifier. Also, the first audio classification system has succeeded to achieve higher performances compared to the second one. Indeed, in the first system using the GASOM algorithm with leave-one-out validation technique, the average accuracy has reached 99.17% for the music/environmental sounds discrimination. Moreover, in both systems, the GASOM algorithm has always reached the best results of performances compared to KNN and SVM algorithms. Therefore, in the first system, the GASOM algorithm has been contributed to obtain an optimized consumption time compared to that one obtained using the two HMM and MLP methods.

Highlights

In order to facilitate and help the users to be more accurate and efficient in their research for multimedia contents on search engines, content-based indexing and retrieval technologies is a good way to help them to access directly to the required multimedia contents
The categorization of audio content analysis applications can be performed in two parts: the first part is the discrimination of an audio stream into homogenous regions and the second part is the discrimination of a speech stream into segments of different speakers
The first audio database used for the evaluation of our algorithms contains many audio types such as speech, music, environmental sounds, others1, others2, others3, which are extracted from different audio events

Summary

Introduction

In order to facilitate and help the users to be more accurate and efficient in their research for multimedia contents on search engines, content-based indexing and retrieval technologies is a good way to help them to access directly to the required multimedia contents. As the audio data contains alternating sections of different audio types, an automatic classification of its content into appropriate audio classes is a fundamental step in the processing of audio streams. This kind of separation is called audio content classification. The segmentation and classification of audio streams according to their content is a useful means for analyzing www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol 9, No 4, 2018 audio, video, and understanding content Performing this task requires an efficient and accurate technique. Computing statistics using MFCC coefficients requires a large amount of data for training [12]

Methods

Results

Conclusion