Hashing documents during the Cookiepocalypse

10 March 2021 simhashthird-party cookiesonline advertisinghashing

The FLoC proposal

As part of their phasing-out of third party cookies and rolling out of the Privacy Sandbox, Google have announced their intentions to replace third party cookies in Chrome. The strategy is to replace each of the functions served by third party cookies in web advertising, one by one, by individual devices that preserve user privacy.

Federated Learning of Cohorts (FLoC) is the replacement for cross-site user tracking on Google Chrome. In order to enable targeting based on browsing interests without cookie identifiers, the user’s browser computes a cohort identifier, that is: a hash of the browsing history.

We expect the following to happen:

Enter locality-sensitive hashing techniques

The candidate algorithm to achieve the above goal is SimHash, which is commonly used to detect near-duplicates by search engines. After vectorizing a user’s browsing history, the p-bit SimHash algorithm works in the following way:

An implementation in Python

When we set out to study the impact of replacing cookies by cohorts in our Machine Learning pipelines at Hybrid Theory, I was unable to find an implementation of the SimHash algorithm in pure Python that served the purposes. So we coded and open-sourced one, and published it as a python package under the name of floc-simhash.

This implementation has allowed us to conduct preliminary analysis on our own datasets, while we wait for Google’s Origin Trial to begin later in this month of March, 2021.

While waiting for details regarding the final form of the FLoC implementation, we have provided two implementations of SimHash:

For more details and examples, please refer to the README.

Open questions

While we wait for implementation details, the most recent news point towards:

These, however, will be subject to change depending on the Origin Trial results.

More generally, the question regarding the eventual adoption of FLoC remains.

In the next few months, we aim at deciding whether the cookiepocalypse results in having to re-think the whole approach to selling ads online or, as claimed in the original proposal, FLoCs are performant enough to allow for the cross-targeting approach to survive on Google Chrome.

References

← back to posts