Over the past few years, deep learning networks have been applied in the estimation of the Click-Through Rate (CTR). The main task of the CTR model lies in predicting a user’s positive/negative response to an item. Most of existing CTR models are constituted of two parts: a deep neural network (DNN) and a wide model (e.g., deep cross network). The wide models are generally designed to learn the feature interaction and contain much more parameters than the DNN. As a consequence, it is quite difficult for existing CTR models to work on the resource-limited devices. Blindly removing the wide model will significantly encumber the predictive performance of the algorithm. In this paper, we adopt the knowledge distillation to train a single multi-branch network and assemble all branches as a teacher network. Each branch network is a simple DNN which not only matches the ground-truth label distribution but also aligns the prediction distribution of the teacher network. One of branch networks, termed as Adaptive Deep Neural Network (ADNN), is trained independently and further combined with a wide model to learn feature interactions. Our method does not require pre-training any high-capacity teacher models, which endows our method with higher efficiency compared with existing ones. The experimental results tested on Criteo and Avazu datasets show that the hybrid model outperforms state-of-the-art methods, and the light model ADNN also has a considerable performance accuracy over certain modern complex models, demonstrating the superiority of the methodology.
Read full abstract