Multitask Mixture of Sequential Experts for User Activity Streams

Zhen Qin,Zhe Chen,Donald Metzler,Zhe Zhao,Jingzheng Qin,Yicheng Cheng

doi:10.1145/3394486.3403359

Abstract

It is often desirable to model multiple objectives in real-world web applications, such as user satisfaction and user engagement in recommender systems. Multi-task learning has become the standard approach for such applications recently. While most of the multi-task recommendation model architectures proposed to date are focusing on using non-sequential input features (e.g., query and context), input data is often sequential in real-world web application scenarios. For example, user behavior streams, such as user search logs in search systems, are naturally atemporal sequence. Modeling user sequential behaviors as explicit sequential representations can empower the multi-task model to incorporate temporal dependencies, thus predicting future user behavior more accurately. Furthermore, user activity streams can come from heterogeneous data sources, such as user search logs and user browsing logs. They typically possess very different properties such as data sparsity and thus need careful treatment when being modeled jointly. In this work, we study the challenging problem of how to model sequential user behavior in the neural multi-task learning settings. Our major contribution is a novel framework, Mixture of Sequential Experts (MoSE). It explicitly models sequential user behavior using Long Short-Term Memory (LSTM) in the state-of-art Multi-gate Mixture-of-Expert multi-task modeling framework. In experiments, we show the effectiveness of the MoSE architecture over seven alternative architectures on both synthetic and noisy real-world user data in G Suite. We also demonstrate the effectiveness and flexibility of the MoSE architecture in a real-world decision making engine in GMail that involves millions of users, balancing between search quality and resource costs.

Full Text