Abstract

Item ranking is important to a social media platform’s success. The order in which posts, videos, messages, comments, ads, used products, notifications are presented to a user greatly affects the time spent on the platform, how often they visit it, how much they interact with each other, and the quantity and quality of the content they post. To this end, item ranking algorithms use models that predict the likelihood of different events, e.g., the user liking, sharing, commenting on a video, clicking/converting on an ad, or opening the platform’s app from a notification. Unfortunately, by solely relying on such event-prediction models, social media platforms tend to over optimize for short-term objectives and ignore the long-term effects. In this paper, we propose an approach that aims at improving item ranking long-term impact. The approach primarily relies on an ML model that predicts negative user experiences. The model utilizes all available UI events: the details of an action can reveal how positive or negative the user experience has been; for example, a user writing a lengthy report asking for a given video to be taken down, likely had a very negative experience. Furthermore, the model takes into account detected integrity (e.g., hostile speech or graphic violence) and quality (e.g., click or engagement bait) issues with the content. Note that those issues can be perceived very differently from different users. Therefore, developing a personalized model, where a prediction refers to a specific user for a specific piece of content at a specific point in time, is a fundamental design choice in our approach. Besides the personalized ML model, our approach consists of two more pieces: (a) the way the personalized model is integrated with an item ranking algorithm and (b) the metrics, methodology, and success criteria for the long term impact of detecting and limiting negative user experiences. Our evaluation process uses extensive A/B testing on the Facebook platform: we compare the impact of our approach in treatment groups against production control groups. The AB test results indicate a 5% to 50% reduction in hides, reports, and submitted feedback. Furthermore, we compare against a baseline that does not include some of the crucial elements of our approach: the comparison shows our approach has a 100x to 30x lower False Positive Ratio than a baseline. Lastly, we present the results from a large scale survey, where we observe a statistically significant improvement of 3 to 6 percent in users’ sentiment regarding content suffering from nudity, clickbait, false / misleading, witnessing-hate, and violence issues.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call