Evaluating recommender systems adequately and thoroughly is an important task. Significant efforts are dedicated to proposing metrics, methods, and protocols for doing so. However, there has been little discussion in the recommender systems’ literature on the topic of testing. In this work, we adopt and adapt concepts from the software testing domain, e.g., code coverage, metamorphic testing, or property-based testing, to help researchers to detect and correct faults in recommendation algorithms. We propose a test suite that can be used to validate the correctness of a recommendation algorithm, and thus identify and correct issues that can affect the performance and behavior of these algorithms. Our test suite contains both black box and white box tests at every level of abstraction, i.e., system, integration, and unit. To facilitate adoption, we release RecPack Tests , an open-source Python package containing template test implementations. We use it to test four popular Python packages for recommender systems: RecPack , PyLensKit , Surprise , and Cornac . Despite the high test coverage of each of these packages, we find that we are still able to uncover undocumented functional requirements and even some bugs. This validates our thesis that testing the correctness of recommendation algorithms can complement traditional methods for evaluating recommendation algorithms.
Read full abstract