Abstract
With the development of hardware and proliferation of Web 2.0, an increasing need to provide efficient access of desired content in large-scale database has emerged. Content-based copy detection (CBCD) has attracted more research interests due to its broad applications. However, most existing CBCD approaches only focus on single view (visual view or audio view). In this paper, we attempt to address the CBCD task by considering both visual and audio information simultaneously. A bag of visual concept words model for video copy detection method is proposed from visual cue, and the main advantage of the proposed method lies in the fact that the presence of semantic concepts is highly robust to spatial and temporal video transformations. On the other hand, a bag of audio words model is presented from audio cue and a coherency vocabulary combined with soft-weighted strategy realizes fast and accurate indexing. Finally, the late fusion is adopted to obtain the final audio–visual result based on visual-only and audio-only results. Intensive experiments are conducted on two large-scale data sets and competitive results are achieved.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.