Today, targeted online advertising relies on unique identifiers assigned to users through third-party cookies--a practice at odds with user privacy. While the web and advertising communities have proposed solutions that we refer to as interest-disclosing mechanisms, including Google's Topics API, an independent analysis of these proposals in realistic scenarios has yet to be performed. In this paper, we attempt to validate the privacy (i.e., preventing unique identification) and utility (i.e., enabling ad targeting) claims of Google's Topics proposal in the context of realistic user behavior. Through new statistical models of the distribution of user behaviors and resulting targeting topics, we analyze the capabilities of malicious advertisers observing users over time and colluding with other third parties. Our analysis shows that even in the best case, individual users' identification across sites is possible, as 0.4% of the 250k users we simulate are re-identified. These guarantees weaken further over time and when advertisers collude: 57% of users with stable interests are uniquely re-identified when their browsing activity has been observed for 15 epochs, increasing to 75% after 30 epochs. While measuring that the Topics API provides moderate utility, we also find that advertisers and publishers can abuse the Topics API to potentially assign unique identifiers to users, defeating the desired privacy guarantees. As a result, the inherent diversity of users' interests on the web is directly at odds with the privacy objectives of interest-disclosing mechanisms; we discuss how any replacement of third-party cookies may have to seek other avenues to achieve privacy for the web.
Read full abstract