Search Personalization Using Machine Learning

Hema Yoganarasimhan

doi:10.2139/ssrn.2590020

Abstract

Query-based search is commonly used by many businesses to help consumers find information/products on their websites. Examples include search engines (Google, Bing), online retailers (Amazon, Macy's), and entertainment sites (Hulu, YouTube). Nevertheless, a significant portion of search sessions are unsuccessful, i.e., do not provide information that the user was looking for. We present a machine learning framework that improves the quality of search results through automated personalization based on a user's search history. Our framework consists of three modules -- (a) Feature generation, (b) NDCG-based LambdaMART algorithm, and (c) Feature selection wrapper. We estimate our framework on large-scale data from a leading search engine using Amazon EC2 servers. We show that our framework offers a significant improvement in search quality compared to non-personalized results. We also show that the returns to personalization are monotonically, but concavely increasing with the length of user history. Next, we find that personalization based on short-term history or within-session behavior is less valuable than long-term or across-session personalization. We also derive the value of different feature sets -- user-specific features contribute over 50% of the improvement and click-specific over 28%. Finally, we demonstrate scalability to big data and derive the set of optimal features that maximize accuracy while minimizing computing speed.

Full Text