Abstract

Reproducibility is a key requirement for scientific progress. It allows the reproduction of the works of others, and, as a consequence, to fully trust the reported claims and results. In this work, we argue that, by facilitating reproducibility of recommender systems experimentation, we indirectly address the issues of accountability and transparency in recommender systems research from the perspectives of practitioners, designers, and engineers aiming to assess the capabilities of published research works. These issues have become increasingly prevalent in recent literature. Reasons for this include societal movements around intelligent systems and artificial intelligence striving toward fair and objective use of human behavioral data (as in Machine Learning, Information Retrieval, or Human–Computer Interaction). Society has grown to expect explanations and transparency standards regarding the underlying algorithms making automated decisions for and around us. This work surveys existing definitions of these concepts and proposes a coherent terminology for recommender systems research, with the goal to connect reproducibility to accountability. We achieve this by introducing several guidelines and steps that lead to reproducible and, hence, accountable experimental workflows and research. We additionally analyze several instantiations of recommender system implementations available in the literature and discuss the extent to which they fit in the introduced framework. With this work, we aim to shed light on this important problem and facilitate progress in the field by increasing the accountability of research.

Highlights

  • Evaluation of Recommender Systems (RS) is an active and open research topic

  • We focus on how to ensure some levels of reproducibility, so that transparency in reporting of experimental settings is increased and, as a consequence, the transparency of the corresponding results of recommender systems evaluation, which is a well-known catalyst for accountability

  • By increasing the reproducibility of research works, accountability will improve, since reproducible environments guarantee that the results can be audited and, under some constraints, repeated, facilitating that the same conclusions will be found

Read more

Summary

Introduction

Evaluation of Recommender Systems (RS) is an active and open research topic. A significant share of Recommender Systems research is based on comparisons of recommendation algorithms’ predictive accuracy: the better the evaluation scores (higher accuracy or lower predictive errors), the better the recommendation. The actual implementation of a recommendation algorithm can considerably diverge from the wellknown, or ideal, formulation due to manual tuning and alignments to specific situations. This has recently been coined as data processing or data collection biases (Olteanu et al 2018). As a sign of this, many open positions in machine learning, data science, and artificial intelligence list experience of recommendation techniques and frameworks as requirements. This gain in popularity has led to an overwhelming amount of research being conducted over the last years. A critical analysis is necessary in order to ensure an advance in the field, not just marginal effects based on strategic design choices (Ferrari Dacrema et al 2019)

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call