Abstract
This work presents a comprehensive study, from an industrial perspective, of the process between the collection of raw data, and the generation of next-item recommendation, in the domain of Video-on-Demand (VoD). Most research papers focus their efforts on analyzing recommender systems on already-processed datasets, but they do not face the same challenges that occur naturally in industry, e.g., processing raw interactions logs to create datasets for testing. This paper describes the whole process between data collection and recommendation, including cleaning, processing, feature engineering, session inferring, and all the challenges that a dataset provided by an industrial player in the domain posed. Then, a comparison on the new dataset of several intent-based recommendation techniques in the next-item recommendation task follows, studying the impact of different factors like the session length, and the number of previous sessions available for a user. The results show that taking advantage of the sequential data available in the dataset benefits recommendation quality, since deep learning algorithms for session-aware recommendation are consistently the most accurate recommenders. Lastly, a summary of the different challenges in the VoD domain is proposed, discussing on the best algorithmic solutions found, and proposing future research directions to be conducted based on the results obtained.
Highlights
T He benefits of recommender systems are clear
The size of industrial datasets might be orders of magnitude larger than research datasets and they are heavily noisy. Given these differences between research and industrial datasets, understanding the construction and processing steps that bring from a low-level interaction log to a high-level dataset ready for an algorithm is important for the research community
This study considers the problem of session-based recommendation, in which a session refers to a group of user actions that occur in a continuous period of time, e.g., the user logs in to the platform and interacts with a few VoD before logging out
Summary
T He benefits of recommender systems are clear. They guide users in the exploration of immense catalogs of products, and they leverage user behaviors to generate accurate, interesting, novel, and serendipitous recommendations [1]–[3]. Industrial datasets, i.e., datasets constructed from the logs of real-world recommenders, pose different challenges in their construction and usage, and they can reproduce more accurately a real application scenario They introduce the impression bias, i.e., the fact that the pattern of interactions between users and items is influenced by how items are presented to the users [6]. The size of industrial datasets might be orders of magnitude larger than research datasets and they are heavily noisy Given these differences between research and industrial datasets, understanding the construction and processing steps that bring from a low-level interaction log to a high-level dataset ready for an algorithm is important for the research community. Few works using industrial datasets [6], [9] have addressed these important aspects
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.