Abstract

Currently, micro-videos have gained widespread acceptance as a prominent form of user-generated content across various social media platforms. Accurate event analysis of micro-videos can greatly enhance the many diverse social media platforms applications. Although some studies have shown promising results from a multimodal perspective, there is still a challenge in extracting informative cues from inaccurate modalities, particularly for text modality that is prone to inaccuracies and noise. In this paper, we propose a multimodal semantically enhanced representation network (MSERN) for micro-video event detection. To better address inaccurate and noisy text sentences, we first extract visual concepts in the form of adjective-noun pairs (ANPs), through a fine-grained common representation module, to complement the textual descriptions. To maximize the acquisition of modality-specific cues from both visual and textual modalities, we then implement a coarse-grained private representation module to ensure that private representations encompass unique facets of the modalities beyond the common perspective. Finally, because two modules will collaborate, the fine-grained common and coarse-grained private representations are integrated to ensure a reinforced micro-video representation. We evaluate our proposed method on a micro-video event detection dataset and the experimental results show a superior performance compared to the state-of-the-art methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.