Common Pitfalls in Training and Evaluating Recommender Systems

Hung-Hsuan Chen,Wen Tsui,Chu-An Chung,Hsin-Chien Huang

doi:10.1145/3137597.3137601

Abstract

This paper formally presents four common pitfalls in training and evaluating recommendation algorithms for information systems. Specifically, we show that it could be problematic to separate the server logs into training and test data for model generation and model evaluation if the training and the test data are selected improperly. In addition, we show that click through rate { a common metric to measure and compare the performance of different recommendation algorithms -- may not be a good measurement of profitability { the income a recommendation module brings to a website. Moreover, we demonstrate that evaluating recommendation revenue may not be a straightforward task as it first looks. Unfortunately, these pitfalls appeared in many previous studies on recommender systems and information systems. We explicitly explain these problems and propose methods to address them. We conducted experiments to support our claims. Finally, we review previous papers and competitions that may suffer from these problems.

Full Text