Abstract

In this paper, we propose intra- and inter-feature orthogonal fusion (IIOF) of local and global features obtained from MS-SincResNet or MS-SSincResNet (a variant of MS-SincResNet) for music emotion recognition (MER). Given a raw waveform of music signal, MS-SincResNet/MS-SSincResNet is first used to learn several 2D representations having different receptive fields and obtain embeddings with time-frequency information from different layers. Then, local and global features are extracted from these embeddings. IIOF consisting of intra-feature OF and inter-feature OF is further employed to integrate both local and global features to obtain a discriminative descriptor for MER. The intra-feature OF is used to enhance the diversity of the global feature, and the inter-feature OF is utilized to reduce redundancies and produce complementary information between local and global features. The experimental results have demonstrated that the representation discriminability can be enhanced by IIOF considering the feature orthogonality. Furthermore, extensive experimental results have shown that the proposed method outperforms other state-of-the-art methods in terms of regression and classification tasks on the well-known MER datasets, including the DEAM dataset and the PMEmo dataset. The codes are available at https://github.com/PeiChunChang/MS-SSincResNet_with_IIOF.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.