The dawn of the Post-Cookie Era: Thoughts on the Future of Web Analytics
This is a recap of my talk @ MeasureCamp UK.
Are we witnessing the end of the Golden Age of Web Analytics? Is attribution dead? should we rely on the machines to figure it out?
Before I answer these questions, let’s take a step back to see how we got to this point:
Analysts and Marketers built on that and had the luxury to think about smarter ways to get behavioral data beyond what you get out of the box, from Funnel Analysis to Heatmaps, and from Elements Engagement to tracking what users copy to their clipboards!
The unethical harvesting of data by the AdTech industry, and the greedy collection of data by Social Media giants, Martech and Analytics vendors, combined with the never-ending news about data breaches, compromises of users’ data, unethical use of private information by Social Media networks, and the abuse of Advertisements as manipulation tool in political campaigns shed the light on the urgency to protect users’ data, prevent profiling users’ across the web and hold someone accountable for all this mess!
All the bad news about data privacy had a positive impact: an increasing awareness among people (the customers, the supporters, you and me!) about privacy and data protection. A few days ago, DuckDuckGo (Privacy-first Search Engine) announced that it surpasses 100 million daily search queries, and Signal (Whatsapp rival) went down after a huge surge of new users after Whatsapp announced sharing users’ data with Facebook. And studies show an increasing trend of using AdBlockers among the internet population (roughly 25.8% of internet users in the US). And a study from 2016 shows that ~11% of users blocks Analytics tools too.
These numbers will definitely vary by time, industry and market, but the trend is clear: users are becoming more privacy-aware and an increasing portion of the web traffic is going dark!
Governments acted by imposing laws to protect its citizens. From The General Data Protection Regulation(GDPR) in Europe to The California Consumer Privacy Act (CCPA), and similarly in Brazil, India, New Zealand and the list goes on.
Most of the regulations have a few common themes: Transparency, Accountability, Clear Purpose for Data Collection, Data Rights and Clear Consent. And while things started messy with some laws, the regulations are getting tighter and stricter. In October 2019, the Court of Justice of the European Union ruled that cookies require explicit consent regardless of personal data being processed, and regulators in the EU and US strengthened the definition of consent to prevent dark-patterns that aim to deceive users.
Similarly, Browsers moved in the same direction. In 2017, Apple deployed the Intelligent Tracking Prevention (ITP) to protect the privacy of its users and reduce cross-site tracking. So who paid the first check? Third-party cookies! (cookies that are set by a domain other than the one a user is visiting, like Advertising networks and Social Media sites).
At first, third-party cookies were allowed to be used for 24 hours only, then fast-forward to today: ITP blocks Third-Party Cookie completely by-default, limits the lifespan of first-party cookie, limits companies from leveraging localStorage and limits CNAME clocking to 7 days.
Btw, Apple also followed these measures with strict fencing to its ecosystem, including limiting the access to Identifier for Advertisers (IDFA), the unique identifier of its mobile devices. Which was very unpleasant news for many, including Facebook that ironically went old-school to express its anger in traditional newspapers!
Anyway, back to the browsers:
While Apple’s measures caused the largest ripples due to its significant market share, many browsers took more aggressive measures like Firefox with the Enhanced Tracking Protection (ETP), and Brave, promoting itself as privacy-first browser with its Shields features (Brave announced yesterday that the now support the IPFS protocol that enables decentralized web). Microsoft Edge tries to catch up with its Tracking Prevention feature, but still far behind.
Seems we missed someone!.. Oh yeah, we forgot Google Chrome. Chrome announced putting limitations on third party cookies with the “Privacy Sandbox” initiative that should be fully deployed within two years. The initiative aims to “sustain a healthy, ad-supported web in a way that will render third-party cookies obsolete”.
Btw, do you know that Ads presents 83.3% of Google’s revenue in 2019?.. not related, but just sayin! ;)
So to wrap up this part here:
- Cookies are generally becoming less reliable.
- Farewell to third-party cookies.
- First-party lifespan is set by the browsers instead of origin.
Cookieless tracking solutions are rising, and we’re witnessing the dawn of the Post-Cookie tracking.
All the above-mentioned measures by users, regulators and browsers lead to a set of pivotal impacts:
1. Inaccuracy of key metrics like New vs. Returning and inflation of the number of Unique users. This is a result of the limitations on the cookie lifespan. You should be able to see that already in Safari users.
2. The Dark Traffic. A significant portion of traffic will be missing, and completely hidden from your analytics tool. This is a result of the increasing use of Adblockers and measures by many browsers where trackers are blocked by default, including the blocking of tracking snippets and blocking of requests to tracking domains.
3. Unreliable Attribution. While we struggle already with multi-touch attribution, things will just get worse and most conversions will be more likely attributed to last-click/first-touch. The average conversion window in your business will decide how badly you’ll be affected by this.
4. A/B testing data quality issues. Similar to attribution, the data quality of your experiments will depend on the duration of the experiment. The A/B testing tools will not be able to identify the same user over 7 days/24 hours/or None (based on the browser and other conditions), therefore, the same user might see different variations through the course of the same test.
Similarly, that would impact personalized campaigns and cookie-based segmentation.
Generally speaking, there will be an inverse relationship between the look-back window and the accuracy of whatever you’re doing. The longer the duration, the less reliable your data will get.
Before we talk about the technical solutions, let’s start with the most important of all: A mindset change!: Analysts should embrace Agility, Focus on What Matters, Break the Silos and Stop thinking of Workarounds:
With the rapid changes in regulations, browsers and users’ behaviour, Web Analytics requires agility, where we don’t set and forget. Continues improvements are key to adapt to the ever-changing landscape.
Focus on what matters:
Passive Collection of data is coming to an end, in a way or another!
We are used to the paradigm of “get it all and think later”, where we collect everything that can be collected, then think about use cases for it later. Regulations put a lot of limitations on this approach now to prevent unnecessary digital fingerprinting. And while this might appear as a limitation, it also acts as a filter to choose metrics that matter. That requires rethinking your KPIs because the value of Web Analytics is defined with how you use the data, not what you collect of it.
Break the Silos:
The success of Web Analytics in the face of the increasing challenges requires widening its scope more than ever. The function of Web Analytics and its expected ROI in any organization depends on an understanding and alignment with IT, Marketing, BI and the Executive Suite.
Stop Thinking of Workarounds!
Marketers and Advertisers used to come up with workarounds since the release of ITP in 2017. But with the release of ITP 2.3, it’s clear that looking for technical workarounds is not a sustainable strategy.
Instead of fighting against the inevitable change, we should embrace it.
And then to the technical solutions:
Switch to Server-side
Yup, fashion is not the only thing that trends backwards these days!
Web Analytics started with server log files back in the early days of the internet, and here we go, back to server-side but with a twist this time: in server-side, the data is rich (formatted and sessionized), cleaned and ready for analysis and answering business questions. Moreover, moving server-side with tools like Google Tag Manager allow you take that even further by moving the vendor requests (like analytics tools and social media pixels) from the client to the server (Google Cloud Platform in this case) that will live in your subdomain, which gives you a few great benefits:
- Minimize the impact of browsers measures (ITP, ETP etc) because most of the requests will be first-party to your own domain.
- Better data control. In the server-side, you control whats being sent to the vendors like hiding users IPs.
- Move the workload (all the tracking pixels) to server side which should enhance the client side experience.
As client-side tracking is becoming less reliable, server-side allows you to have full control of what’s being collected and what’s being sent to vendors without being affected by browsers measures or ad blockers. And while this great to overcome a lot of limitations, it is also risky because what happens server-side stays server-side, in the dark! Users (the browser) will not be able to track how the data is being processed or where it has been sent, which requires that organizations should act in transparency, accountability and respect to users’ choices.
It is also important to note that some behavioural metrics might not be accessible server-side.
Own your data-pipeline
While existing third-party tools like Google Analytics provide premium features and handy functionalities out of the box for free (well, not really free but let’s pretend it is for a moment), regulators are putting more restrictions to protect the privacy of the users by a consent wall.
The GDPR in Europe, for example, requires explicit consent (the user must accept to be tracked before firing any tracking codes that collects Personally Identifiable Information) in the case of Google Analytics unless you apply the following:
- Anonymization of IPs
- Deactivation of User-ID
- Unlinking Google Products (like Advertising networks)
- Deactivating Data Sharing with Google
In the age of Big Data and Machine Learning, the large number of seemingly anonymized data can be turned into a unique digital fingerprint that can be used to profile users and track them across the web!
Therefore, I expect that we should expect more restrictions on running Google Analytics without explicit consent in the future from a legal perspective. Additionally, you should consider the hidden cost of sharing the behaviour of your users/supporters with Google without their explicit consent, which means that a larger portion of your users will be completely hidden from you! (in some countries, up to 90% of users don’t accept consent to be tracked).
In some situations, it might be justified especially if Google Ads are crucial for your business, and in other, it might not! So the question will be: does it worth it? same applies to Facebook and other marketing platforms.
Therefore, I believe that relying on an open-source, self-hosted analytics solution should be the go-to solution if you have the resources, and currently, there are many solutions like Snowplow, Matomo On-Premise and many others that allow you to own your data, avoid the passive collection of data and avoid the unnecessary sharing of data with third parties.
A hyper approach, and final thought
Owning your own pipeline with self-hosted open-source solutions is future-proof, however, it comes with a price: the cost of building and maintaining the pipelines, from the implementation (hosting and data collection) to processing and reporting requires in-house technical expertise, and/or using managed services that would also come with a price. That includes the hidden costs of learning, consultations and staff which requires calculating the return on action.
On the hand, you have free tools like Google Analytics that provides premium features and enterprise-level services with an end-to-end solution from hosting and data collection to reporting, but also comes with its price: You get a lot of information about your visitors, but you don’t get it alone, the vendors also do! and with that, you get uncertain future and unnecessary exposure of users’ data to big tech.
Therefore, I believe the perfect solution will be a hyper approach, where you have your own analytics pipeline that guarantees the tracking of the essential metrics that you require for the success of your business, and on the same time, third-party tools that give you state of the art analysis and reporting functionalities, but requires explicit consent from the users and therefore limited to sample of your data.
“Privacy is not an option, and it shouldn’t be the price we accept for just getting on the internet.” — Gary Kovacs
And I believe the same is true to Analytics. Privacy of your users, the supporters of your organization, you and me shouldn’t be the price we accept to do analytics!
Published on my blog, and reformatted for medium.