The Five Pitfalls in Event Tracking To Avoid

28.09.23 • 7 min read

At Tasman, we’ve had the privilege of partnering with some 40+ organisations over the past 6 years. These business have ranged in size and complexity, some had existing (and mature) data capabilities and others were completely greenfield, however there is one challenge that has been common across all of them - tracking.

The term tracking generally encapsulates two main activities:

Data Creation - the process of designing and implementing events that track user interactions and behaviours within your products. Examples of where data can be created include websites and mobile apps etc (things that users interact with).
Data Collection - the process of implementing a pipeline that is able to capture event-based data being generated by different systems within the stack. Examples of sources of data to be collected include backend APIs, customer engagement systems, MMPs etc.

Whether your looking at putting in a customer/behavioural data platform (Snowplow, Segment, Rudderstack), product analytics tool (Amplitude, Mixpanel, mParticle) or a customer engagement tool (Braze, Iterable), you are creating or collecting data.

Whilst each of these tools target different use cases (though they are all the best at everything if you believe their marketing teams), they all involve elements of tracking.

We have summarised our experiences and compiled a list of the most common issues we see with tracking deployments.

Poorly Planned Tracking

Building an effective tracking plan is far more difficult than it seems. When setting out to understand how users are interacting with a product or service, many teams fall foul of the ‘track everything’ mentality. This approach generally materialises in 2 ways:

The ‘No Properties’ approach (very common). Every button, page and potential interaction gets its own event. Properties are all but forgotten about (mainly because the events are so hyper specific the properties are included in the event name).
The ‘All the properties’ approach (less common). Rather than implementing well-considered analytics events, API call payloads containing hundreds of lines of JSON data get dumped into events. This results in a deeply nested mess of properties that requires complex data models to get any value from, and more tightly couples the events to the backend architecture (which is something we try to avoid). This is effectively data exhaust (credit to Yali Sassoon @ Snowplow)

Both of these approaches are trying to cut corners when it comes to planning and implementation, which is often motivated by a lack of buy in and ownership from the engineering teams responsible for the implementation - more on that later.

Its counter intuitive, but in our experience less is more when it comes to tracking. Too many events or too many properties, and maintenance becomes a huge burden.

You can read more about how we do tracking plans at Tasman here.

Poorly Implemented Tracking

A perfect tracking plan is useless if it’s not implemented correctly. Ensuring events fire at the right time, every time and that those events and their properties are consistently named, cased and typed is absolutely critical to obtaining trustworthy insights. How do you know if your new feature is underperforming or not if you cannot be sure that the tracking is implemented correctly?

There are a couple of areas (that do overlap) that we see engineering teams struggling with most often:

Cookies - Cookies are critical to maintaining identity between pages and sessions and since the introduction of GDPR, organisations have been obliged to ask users which cookies they would like to opt-in to. This makes implementing tracking difficult as engineering teams have to navigate organisational policies around data privacy. Far too often we see teams losing critical insights of the earliest parts of their acquisition funnels because they didn’t quite get this right.
Identities - somewhat linked to the above, identifying your users is critical to many types of analysis. If you aren’t able to connect user interactions together, then the value of the data collected is far more limited. For example, if you don’t know what your user did before they signed up to your product, how do you know what types of customer acquisition activities are providing the highest ROI.

It is unfair to put all the responsibility for getting implementation correct onto the engineering teams, which is why its important to make sure there is an effective validation and feedback mechanism between the plan and implementation.

Lack of Ownership

Tracking is a naturally multi-disciplinary activity, and stakeholders of the data may sit across many departments. Despite this or perhaps because of this, it’s commonly the case that tracking doesn’t have a natural owner within most small and medium businesses (larger business generally have the budget for a dedicated tracking team, though where that team sits in the organisation is often debated).

Product teams may own product analytics tooling such as Amplitude or Mixpanel. Marketing may own GA4, Appsflyer, Braze or Iterable. Engineering owns the codebase where tracking is deployed. All of these teams have their own agendas and objectives, which often results in each tool being deployed in isolation, resulting in multiple event pipeline implementations with inconsistent outputs, and huge burden for the engineering teams who need to maintain them (and often don’t have bandwidth to stay on top of things). Not to mention the significant cost.

To avoid this, businesses need to approach tracking strategically and cross functionally, building a scalable event pipeline architecture that meets the needs all of stakeholders.

We’ll be posting shortly about the differences between traditional and composable CDP strategies so stay tuned for that.

Open Source Cost Fallacy

Budget constraints can often drive organisations to consider open source tooling for tracking such Snowplow OS and Rudderstack OS, or in the worst case scenario, taking on building a bespoke event pipeline.

Whilst open source has its place, our experience is that in almost all scenarios the cost of implementing and maintaining an open source solution or building a bespoke solution has a much larger total cost of ownership. Engineers are expensive, and building and maintaining a truly scalable event pipeline is complex and time consuming. It’s rarely worth the risk.

Open source is free, but expensive.

Most tracking solutions have free tiers that scale as your event or user numbers grow. It can get pricey, but thinking ahead about your use case and trying to estimate your events per user count (now and in the medium term) will enable to you to pick a product that offers the better scalability of pricing between event and user based. Alternatives like Snowplow Enterprise have different pricing model that can offer good long term value if you have significant event volumes.

All that being said, the single best way to keep costs down is to not fall for the ‘track everything’ approach. Otherwise, the vast majority of events wont be analysed and aren’t providing any value, but are contributing to the majority of the cost.

Extracting value from the collected data

Ultimately, this is what it’s all about, but it’s a common area of difficulty.

We’ve had numerous enquiries from engineering teams that are struggling with sluggish pipelines and large data warehouse bills as they attempt to structure and organise billions of rows of event data. Those unfortunately stuck on Redshift tend to have the biggest challenge, but it’s often the case with Snowflake and BigQuery too. We’ll talk about techniques to handle this at a later point (hint: minimise expensive deduplication and avoid merge-based incremental materialisations in favour of append-only strategies).

When data can be wrangled into shape, analysing event time series data with traditional BI tools can also be difficult. So unless you have significant data capability, we would generally recommend starting out with an all-in-one tool such as Amplitude or Mixpanel. And we’re quite excited by the new data warehouse-native version of Amplitude that is on the horizon.

Tackling these challenges

Collecting event data (whether user, customer, process or otherwise) is a vital activity for nearly every type of business model - even if the only digital product is a website. This type of data can be so powerful that there are those pushing to drop to dimensional modelling in its entirety in favour of making all analytics event-based (see activity schema).

Therefore, it is critical that businesses see event collection and tracking as a key strategic activity, that is supported with appropriate budgets and leadership buy-in. Once that is in place, the next step is to establish a framework for the design, implementation and maintenance of an appropriate tracking architecture. This is something we’ve had the opportunity to tune at Tasman, and we’ll be sharing our framework in an upcoming post.