Abstract

This paper presents a deep similarity matching-based emotion-oriented music video (MV) generation system, called DEMV-matchmaker, which utilizes an emotion-oriented deep similarity matching (EDSM) metric as a bridge to connect music and video. Specifically, we adopt an emotional temporal course model (ETCM) to respectively learn the relationship between music and its emotional temporal phase sequence and the relationship between video and its emotional temporal phase sequence from an emotion-annotated MV corpus. An emotional temporal structure preserved histogram (ETPH) representation is proposed to keep the recognized emotional temporal phase sequence information for EDSM metric construction. A deep neural network (DNN) is then applied to learn an EDSM metric based on the ETPHs for the given positive (official) and negative (artificial) MV examples. For MV generation, the EDSM metric is applied to measure the similarity between ETPHs of video and music. The results of objective and subjective experiments demonstrate that DEMV-matchmaker performs well and can generate appealing music videos that can enhance the viewing and listening experience.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.