Застосування навчання з підкріпленням для побудови рекомендаційної системи

Б Романюк,О Пелюшкевич,Ю Щербина

doi:10.30970/vam.2021.29.11016

Abstract

Nowadays the recommendation is an important part of human life, it is used for industrial, business, and even academic purposes. Currently, various recommendation techniques are provided such as content-based ltering [11], matrix factorization [6], Naive Bayes classi- er [1,12], logistic regression [1], and multi-armed bandit approaches [7,18]. Unfortunately, all of them are considering the recommendation as a static process, assuming that user preferences are not changing during the period of time and they are not taking into consideration that user interactions with the system are sequential. These methods and algorithms are widely used by giant companies like Amazon [9], Netix [4, 15] and Google [2]. The article demonstrates using machine learning and especially reinforcement learning for items recommendation based on positive interactions of users with the system. This approach, unlike others, is dynamic, otherwise, it is adapted to user preferences changes and allows to focus on maximizing the benets of user interaction with recommendation system. Besides that reinforcement learning also is focused on long-term reward and understands that user interactions with the system are sequential. The article describes an example of using this approach to recommend movies based on user positive ratings provided to that movies, but speaking in general this technique can also be used for any type of information the only necessary thing is a representation of these elements to work with the neural network interface. The environment of the proposed recommendation system is represented as Markov Decision Process [19] in which the user interacts with the recommendation agent and that agent creates proposed items for a specic user. Neural network architecture is built using the ¾Actor-Critic¿ learning conceptual model [19] which combines two methods of learning: Policy-learning as of ¾Actor¿ network and Q-learning as of ¾Critic¿ network. ¾Actor¿ network is taking users' positive interaction with the system as an input and generates approximate recommendation. On the other hand ¾Critic¿ is taking that approximate Ðîìàíþê Á., Ïåëþøêåâè÷ Î., Ùåðáèíà Þ. 13 recommendation, concatenate it with positive interaction, and calculate Q-value to determine how good will be that approximate recommendation. The combination of these two learning techniques considers both dynamic adaptations to user preferences changes and long-term interactions which lead to fast performance and better recommendations are proposed. The main purpose of this process is to maximize reward, which means that the proposed recommendation will be positive for both user and system. Inside the system, movies are represented as an array of numeric elements to work properly with the neural network interface. Movies representation are created according to specic characteristics of each one, such as year of production, director, genres, country of production, companies, rating, popularity, budget, etc. To combine all those properties principal component analysis is used. Besides that to represent user ratings there was developed ¾State representation module¿, which combines rated movies and numeric rates itself and represents them in the same way as an array of digits. 10 thousand movies and 1 million rates from a dataset called MovieLens were used to train the model, from which 95% of ratings were used to train the model and the other 5% for accurate calculation. The proposed approach demonstrates great convergence up to 98% on big data sets and does not take a big amount of time to train the neural network model. The proposed methodology combines the dynamic approach of generating recommendations understanding user actions as a sequential process and concentrating on getting a long-term reward. The recommendation in such a way is very perspective in the future and can be used by big companies to maximize their prots from sealing dierent types of things or information.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Застосування навчання з підкріпленням для побудови рекомендаційної системи

Abstract

Talk to us

Similar Papers

More From: Application Mathematics and Informatics

Lead the way for us

Similar Papers

Framing social media communication: Investigating the effects of brand post appeals on user interaction
Timm F Wagner ... Kai-Ingo Voigt
European Management Journal | VOL. 35
Timm F Wagner, et. al.Timm F Wagner ... Kai-Ingo Voigt
20 May 2017
European Management Journal | VOL. 35

EEG-based brain-computer interface methods with the aim of rehabilitating advanced stage ALS patients
Alireza Pirasteh ... Majid Pouladian
Disability and Rehabilitation: Assistive Technology | VOL. ahead-of-print
Alireza Pirasteh, et. al.Alireza Pirasteh ... Majid Pouladian
23 Apr 2024
Disability and Rehabilitation: Assistive Technology | VOL. ahead-of-print

Improving ranking function and diversification in interactive recommendation systems based on deep reinforcement learning
Vahid Baghi ... Rooholah Abedian
-
Vahid Baghi, et. al.Vahid Baghi ... Rooholah Abedian
03 Mar 2021
03 Mar 2021

Usability evaluation process of brain computer interfaces
Yoselyn Nohemí Ortega-Gijón ... Carmen Mezura-Godoy
-
Yoselyn Nohemí Ortega-Gijón, et. al.Yoselyn Nohemí Ortega-Gijón ... Carmen Mezura-Godoy
30 Sep 2019
30 Sep 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Застосування навчання з підкріпленням для побудови рекомендаційної системи

Abstract

Talk to us

Similar Papers

More From: Application Mathematics and Informatics