Spectral-Domain Augmentation for Cover Song Identification
Spectral-Domain Augmentation for Cover Song Identification
- Conference Article
2
- 10.1145/1459359.1459542
- Oct 26, 2008
We develop a content-based audio COver Song IdeNtification (COSIN) system to detect/group cover songs. The COSIN takes music audio content as input and performs similarity searching to locate variants of the input (i.e., cover versions). Identified cover songs are returned in the rank order according to their similarity to the input. The COSIN also incorporates a set of tools to evaluate retrieval performance so researchers can explore different retrieval schemes and parameters (e.g. recall, precision). The COSIN utilizes a suite of techniques to detect cover songs including: Pitch + Dynamic Programming (DP), Chroma + DP, and Semantic Feature Summarization (SFS) + Hash-Based Approximate Matching (HBAM). Demonstration system shows that COSIN is a very potential music content retrieval tool. Running some music retrieval schemes on COSIN platform, recent experiments with SFS + LSH Variants demonstrate a nicely balanced efficiency (search speed) v. performance (search accuracy) tradeoff.
- Conference Article
3
- 10.1109/ibssc47189.2019.8973064
- Jul 1, 2019
A cover song, by definition, is a rendition of a previously released song and mapping these cover songs to their original song is defined as ”Cover Song Identification.” In this paper, we propose multiple cover song identification methods using Convolutional Neural Network (CNN) models as well as transfer learning to extract features which can be trained on statistical models for binary classification. We develop two CNN models that are trained on a cross-similarity matrix which is generated from a pair of songs as input. Firstly we designed a simple CNN architecture that was trained on two labels 1. cover pair relationship; 2. non-cover pair relationship. Our second approach uses a CNN model known as the Inception Model. We train the model by generating cross-similarity matrices for both the labels and then converting them into images. At later stage, we use a ranking method that sorts the probabilities of the cover relation in descending order and the song with the highest probability is chosen as a match. Based on the evaluation, Inception model performs the best, scoring the accuracy of 93.4%.
- Conference Article
4
- 10.1109/mmsp.2016.7813372
- Sep 1, 2016
This paper focuses on cover song identification over a large-scale dataset. Identifying all covers of a query song from music collection is a challenging task since covers vary in multiple aspects, such as tempo, key, and structure. For the large-scale dataset, cover song identification is more challenging and few works have been published. Previous works usually use a single representation for a whole song, such as 2D Fourier transform and chord profiles, which cannot reflect the property that covers are largely determined by a local similarity. To address this problem, we propose a novel cover song identification method based on music structure segmentation. The proposed structural method identifies cover songs on section level instead of song level. The experimental results show that the structural method improves the mean average precision of 2D Fourier transform method from 9.5% to 12.1%. In addition, we also propose a two-layer cover song identification system to improve the efficiency.
- Conference Article
- 10.1117/12.913429
- Oct 1, 2011
Content-based music analysis has drawn much attention due to the rapidly growing digital music market. This paper describes a method that can be used to effectively identify cover songs. A cover song is a song that preserves only the crucial melody of its reference song but different in some other acoustic properties. Hence, the beat/chroma-synchronous chromagram, which is insensitive to the variation of the timber or rhythm of songs but sensitive to the melody, is chosen. The key transposition is achieved by cyclically shifting the chromatic domain of the chromagram. By using the Hidden Markov Model (HMM) to obtain the time sequences of songs, the system is made even more robust. Similar structure or length between the cover songs and its reference are not necessary by the Smith-Waterman Alignment Algorithm.
- Conference Article
8
- 10.1109/ism.2017.32
- Dec 1, 2017
We introduce Kara1k, a new musical dataset composed of 2,000 analyzed songs thanks to a partnership with a karaoke company. The dataset is divided into 1,000 cover songs provided by Recisio Karafun application1, and the corresponding 1,000 songs by the original artists. Kara1k is mainly dedicated toward cover song identification and singing voice analysis. For both tasks, it offers novel approaches, as each cover song is a studio-recorded song with the same arrangement as the original recording, but with different singers and musicians. Essentia, harmony-analyser, Marsyas, Vamp plugins and YAAFE have been used to extract audio features for each track in Kara1k. We provide metadata such as the title, genre, original artist, year, International Standard Recording Code and the ground truths for the singer's gender, backing vocals, duets and lyrics' language. Additionally, we provide the instrumental track and the pure singing voice track for each cover song. We showcase two use-case experiments for Kara1k. In the cover song identification task using the Dynamic Time Warping method, we provide a comparison of traditional and new features: chroma and MFCC features, chords and keys, and chroma and chord distances. We obtain 84-89% identification accuracy for three of the features, which justifies our focus on karaoke songs. In the supporting experiment on singer gender classification, we evaluate the difference in the performance in two conditions - a pure singing voice and the singing voice mixed with the background music. The Kara1k dataset is freely available under the KaraMIR project website2.
- Research Article
276
- 10.1109/tasl.2008.924595
- Aug 1, 2008
- IEEE Transactions on Audio, Speech, and Language Processing
We present a new technique for audio signal comparison based on tonal subsequence alignment and its application to detect cover versions (i.e., different performances of the same underlying musical piece). Cover song identification is a task whose popularity has increased in the music information retrieval (MIR) community along in the past, as it provides a direct and objective way to evaluate music similarity algorithms. This paper first presents a series of experiments carried out with two state-of-the-art methods for cover song identification. We have studied several components of these (such as chroma resolution and similarity, transposition, beat tracking or dynamic time warping constraints), in order to discover which characteristics would be desirable for a competitive cover song identifier. After analyzing many cross-validated results, the importance of these characteristics is discussed, and the best performing ones are finally applied to the newly proposed method. Multiple evaluations of this one confirm a large increase in identification accuracy when comparing it with alternative state-of-the-art approaches.
- Research Article
2
- 10.3390/app8081383
- Aug 16, 2018
- Applied Sciences
Similarity measurement plays an important role in various information retrieval tasks. In this paper, a music information retrieval scheme based on two-level similarity fusion and post-processing is proposed. At the similarity fusion level, to take full advantage of the common and complementary properties among different descriptors and different similarity functions, first, the track-by-track similarity graphs generated from the same descriptor but different similarity functions are fused with the similarity network fusion (SNF) technique. Then, the obtained first-level fused similarities based on different descriptors are further fused with the mixture Markov model (MMM) technique. At the post-processing level, diffusion is first performed on the two-level fused similarity graph to utilize the underlying track manifold contained within it. Then, a mutual proximity (MP) algorithm is adopted to refine the diffused similarity scores, which helps to reduce the bad influence caused by the “hubness” phenomenon contained in the scores. The performance of the proposed scheme is tested in the cover song identification (CSI) task on three cover song datasets (Covers80, Covers40, and Second Hand Songs (SHS)). The experimental results demonstrate that the proposed scheme outperforms state-of-the-art CSI schemes based on single similarity or similarity fusion.
- Conference Article
2
- 10.1109/mlsp52302.2021.9596389
- Oct 25, 2021
Deep Learning (DL) has recently been applied successfully to the task of Cover Song Identification (CSI). Meanwhile, neural networks that consider music signal data structure in their design have been developed. In this paper, we propose a Pitch Class Key-Invariant Network, PiCKINet, for CSI. Like some other CSI networks, PiCKINet inputs a Constant-Q Transform (CQT) pitch feature. Unlike other such networks, large multi-octave kernels produce a latent representation with pitch class dimensions that are maintained throughout PiCKINet by key-invariant convolutions. PiCKINet is seen to be more effective, and efficient, than other CQT-based networks. We also propose an extended variant, PiCKINet+, that employs a centre loss penalty, squeeze and excite units, and octave swapping data augmentation. PiCKINet+ shows an improvement of ~17% MAP relative to the well-known CQTNet when tested on a set of ~16K tracks.
- Research Article
4
- 10.1016/j.apacoust.2020.107777
- Dec 14, 2020
- Applied Acoustics
Time complexity evaluation of cover song identification algorithms
- Conference Article
17
- 10.1109/icassp39728.2021.9414128
- Jun 6, 2021
We present in this paper ByteCover, which is a new feature learning method for cover song identification (CSI). Byte-Cover is built based on the classical ResNet model, and two major improvements are designed to further enhance the capability of the model for CSI. In the first improvement, we introduce the integration of instance normalization (IN) and batch normalization (BN) to build IBN blocks, which are major components of our ResNet-IBN model. With the help of the IBN blocks, our CSI model can learn features that are invariant to the changes of musical attributes such as key, tempo, timbre and genre, while preserving the version information. In the second improvement, we employ the BN-Neck method to allow a multi-loss training and encourage our method to jointly optimize a classification loss and a triplet loss, and by this means, the inter-class discrimination and intra-class compactness of cover songs, can be ensured at the same time. A set of experiments demonstrated the effectiveness and efficiency of ByteCover on multiple datasets, and in the Da-TACOS dataset, ByteCover outperformed the best competitive system by 18.0%.
- Research Article
- 10.33851/jmis.2022.9.1.69
- Apr 30, 2022
- Journal of Multimedia Information System
Extraction of a salient chromagram is utmost important for cover song identification. Cover song refers to a live performance, a remix, or a new recording of a previously recorded track. This paper utilizes the Savitzky–Golay filters in chromagram extraction for suppressing timber-related components of a music signal, which is not preserved while generating cover songs. By removing the timber-related components, the discriminative tonal components, which are conducive for cover song identification, are emphasized in chromagram. Experiments on cover song identification over two datasets show that the Savitzky–Golay filters are more effective in reducing timber effects in chromagram than other types of filters.
- Conference Article
13
- 10.1109/icassp.2008.4517546
- Mar 1, 2008
Nowadays, the term cover song (or simply cover) can mean any new version, performance, rendition, or recording of a previously recorded track. Cover song identification is a task that has received increased popularity in the Music Information Retrieval (MIR) community in recent years, as it provides a direct and objective way for evaluating music similarity. In this paper, we propose a new method for determining the similarity between tonal sequences and, therefore, for identifying cover songs. This is based on a novel chroma similarity measure, and on a newly developed dynamic programming local alignment technique. Results confirm that the performance of the proposed system is significantly superior to other state-of-the-art approaches (more than 57% better).
- Conference Article
31
- 10.24963/ijcai.2019/673
- Aug 1, 2019
Cover song identification is an important problem in the field of Music Information Retrieval. Most existing methods rely on hand-crafted features and sequence alignment methods, and further breakthrough is hard to achieve. In this paper, Convolutional Neural Networks (CNNs) are used for representation learning toward this task. We show that they could be naturally adapted to deal with key transposition in cover songs. Additionally, Temporal Pyramid Pooling is utilized to extract information on different scales and transform songs with different lengths into fixed-dimensional representations. Furthermore, a training scheme is designed to enhance the robustness of our model. Extensive experiments demonstrate that combined with these techniques, our approach is robust against musical variations existing in cover songs and outperforms state-of-the-art methods on several datasets with low time complexity.
- Research Article
24
- 10.1109/taslp.2015.2416655
- Jun 1, 2015
- IEEE/ACM Transactions on Audio, Speech, and Language Processing
This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/
- Research Article
1
- 10.1142/s1793351x18400202
- Dec 1, 2018
- International Journal of Semantic Computing
We introduce KaraMIR, a musical project dedicated to karaoke song analysis. Within KaraMIR, we define Kara1k, a dataset composed of 1000 cover songs provided by Recisio Karafun application, and the corresponding 1000 songs by the original artists. Kara1k is mainly dedicated toward cover song identification and singing voice analysis. For both tasks, Kara1k offers novel approaches, as each cover song is a studio-recorded song with the same arrangement as the original recording, but with different singers and musicians. Essentia, harmony-analyser, Marsyas, Vamp plugins and YAAFE have been used to extract audio features for each track in Kara1k. We provide metadata such as the title, genre, original artist, year, International Standard Recording Code and the ground truths for the singer’s gender, backing vocals, duets, and lyrics’ language. KaraMIR project focuses on defining new problems and describing features and tools to solve them. We thus provide a comparison of traditional and new features for a cover song identification task using statistical methods, as well as the dynamic time warping method on chroma, MFCC, chords, keys, and chord distance features. A supporting experiment on the singer gender classification task is also proposed. The KaraMIR project website facilitates the continuous research.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.