Abstract

Current web tracking practices pose a constant threat to the privacy of Internet users. As a result, the research community has recently proposed different tools to combat well-known tracking methods. However, the early detection of new, previously unseen tracking systems is still an open research problem. In this paper, we present TrackSign+, a novel approach to discovering new web tracking methods. The main idea behind TrackSign+ is the use of code fingerprinting to identify common pieces of code shared across multiple domains. To detect tracking fingerprints, TrackSign+ builds a novel 4-mode network graph that captures the relationship between domains, URLs, online resources, and code fingerprints. We evaluated TrackSign+ with the 1.5M most popular Internet domains, including more than 45M web resources from almost 77M HTTP requests. Our results show that our method can detect new web tracking resources with high precision (over 92%). TrackSign+ was able to detect more than 300k new trackers, 800k new tracking resources, and 4.5M new tracking URLs, not yet detected by most popular pattern lists at the time. Finally, we also validated the effectiveness of TrackSign+ with more than 20 years of historical data from the Internet Archive.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call