Abstract

Viewer gifting is an important business mode in live streaming industry, which closely relates to the income of the platforms and streamers. Previous studies on gifting prediction are often limited to cross-section data and consider the problem from the macro perspective of the whole live streaming. However, the multimodal information and the time accumulation effect of live streaming content on viewer gifting behavior are ignored. In this paper, we put forward a multimodal time-series method (MTM) for predicting real-time gifting. The core module of the method is the multimodal time-series analysis (MTA), which targets at effectively fusing multimodal information. Specifically, the proposed orthogonal projection (OP) model can promote cross-modal information interaction without introducing additional parameters. To achieve the interaction of multi-modal information at the same level, we also design a stackable joint representation layer, which makes each target modality's representation (visual, acoustic and textual modality) can benefit from all the other modalities. The residual connections are introduced as well to ensure the integration of low-level and high-level information. On our dataset, our model shows improved performance compared to other advanced models by at least 8% on F1. Meanwhile, the MTA is able to meet the real-time requirements of the live streaming setting, and has demonstrated its robustness and transferability in other tasks. Our research may offer some insights about how to efficiently fuse multimodal information, and contribute to the research on viewer gifting behavior prediction in the live streaming context.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call