Abstract
A vast amount of audio features have been proposed in the literature to characterize the content of audio signals. In order to overcome specific problems related to the existing features (such as lack of discriminative power), as well as to reduce the need for manual feature selection, in this article, we propose an evolutionary feature synthesis technique with a built-in feature selection scheme. The proposed synthesis process searches for optimal linear/nonlinear operators and feature weights from a pre-defined multi-dimensional search space to generate a highly discriminative set of new (artificial) features. The evolutionary search process is based on a stochastic optimization approach in which a multi-dimensional particle swarm optimization algorithm, along with fractional global best formation and heterogeneous particle behavior techniques, is applied. Unlike many existing feature generation approaches, the dimensionality of the synthesized feature vector is also searched and optimized within a set range in order to better meet the varying requirements set by many practical applications and classifiers. The new features generated by the proposed synthesis approach are compared with typical low-level audio features in several classification and retrieval tasks. The results demonstrate a clear improvement of up to 15–20% in average retrieval performance. Moreover, the proposed synthesis technique surpasses the synthesis performance of evolutionary artificial neural networks, exhibiting a considerable capability to accurately distinguish among different audio classes.
Highlights
Due to the drastically increased amount of multimedia data available in the Internet and in various public and personal databases, the development of efficient indexing and retrieval methods for large multimedia databases has become a widely studied research topic
The audio class samples are collected from a few different data sources; the speech classes are derived from the TIMITa database, the music classes are from the RWC Music Databaseb and another music collection at Tampere University of Technology (TUT), the “general” audio sounds were purchased from the StockMusic.com webpage,c and, the singing and whistling samples are self-recorded and produced at TUT
In the experiments shown unless stated otherwise, the following parameters and settings were used for the evolutionary feature synthesis (EFS): the depth of the synthesis was set to K = 7, meaning that 7 operators and K + 1 = 8 features were chosen for the synthesis process of each output vector element, and the total number of operators, listed in Table 4 for features fa and fb, was set to Θ = 18
Summary
Due to the drastically increased amount of multimedia data available in the Internet and in various public and personal databases, the development of efficient indexing and retrieval methods for large multimedia databases has become a widely studied research topic. Scientific fields, such as digital signal processing (DSP) and computer science ( machine learning), provide efficient and mathematically well-defined methods for data mining and knowledge discovery from specific observations or databases [1]. Whenever machine learning techniques are to be applied to data classification or clustering tasks, certain features need to be extracted from the data. The related work in this field is presented which focuses on the two most important feature enhancement methods in the literature, feature selection and feature synthesis ( known as feature generation/construction/transformation)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: EURASIP Journal on Audio, Speech, and Music Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.