Effective and efficient negative sampling in metric learning based recommendation

Junha Park,Yeon-Chang Lee,Sang-Wook Kim

doi:10.1016/j.ins.2022.05.039

Abstract

In this paper, we start by pointing out the problem of a negative sampling (NS) strategy, denoted as nearest-NS (NNS), used in metric learning (ML)-based recommendation methods. NNS samples the items nearer to a user with higher probability among her unrated items. This could move her preferred items far away from her, thereby making the preferred items excluded from top-K recommendation. To address the problem, we first define a concept of a cage for a user, a region that contains the items highly likely preferred by her. Based on the concept, we propose a novel NS strategy, named as cage-based NS (CNS), that makes her preferred items rarely sampled as negative items, thereby improving the accuracy of top-K recommendation. Furthermore, we propose CNS+, an improved version of CNS, that reduces the computation overhead of CNS. CNS+ strategy provides performance than CNS much higher, however, not requiring to sacrifice the accuracy. Through extensive experiments using four real-life datasets, we validate the effectiveness (i.e., accuracy) and efficiency (i.e., performance) of the proposed approach. We first demonstrate that our CNS strategy addresses successfully the problem of NNS strategy. In addition, we show that applying our CNS strategy to three existing ML-based recommendation methods (i.e., CML, LRML, and SML) improves their accuracy consistently and significantly in all datasets and with all metrics. Also, we confirm that CNS+ strategy significantly reduces the execution times with (almost) no loss of accuracy of CNS strategy. Finally, we show that our CNS and CNS+ strategies have a linear scalability with the increasing number of ratings.

Full Text