A common practice for websites is to rely on services provided by third party sites to track users and provide personalized experiences. Unfortunately, this practice has strong implications for both users and performance. From one hand, the privacy of individuals is at a risk given the use of valuable information used for the reconstruction of personal profiles. From the other hand, many existing countermeasures to protect privacy, having been implemented into Web browsers, exhibit performance issues, mainly due to the use of huge (and difficult to maintain up to date) lists of resources that have to be filtered out, given their privacy intrusiveness.To overcome these limitations, we propose the use of a hybrid mechanism exploiting blacklisting and machine learning for the automatic identification of privacy intrusive services requested while browsing Web pages. The idea is to use the blacklisting technique (widely used by the majority of privacy tools), in combination with a machine learning model which distinguishes between malicious and functional resources, and hence updates the blacklist, accordingly. We found out that machine learning models are able to classify JavaScript programs and HTTP requests with accuracy up to 91% and 97%, respectively.We provided a prototype implementation of this hybrid mechanism, named GuardOne, and we performed an exhaustive evaluation study to assess its effectiveness and performance. Results showed that GuardOne is able to filter out malicious resources from users’ requests without performance degradation when compared with traditional systems that leverage on the use of static lists for filtering. Moreover, results about effectiveness show that our mechanism, even with some small improvements, is able to efficiently filter out malicious requests and reduce in a substantial way personal information leakage.
Read full abstract