The Lego Data Layer: Anatomy of an Interaction

Like every other self-respecting digital analyst, I started playing with Enhanced Ecommerce soon after it became public. I’d read the Developer Guide several times and while it seemed quite straight forward on the surface, there was something nagging me about the anatomy of the enhanced ecommerce event.

Up until now we’ve been tracking interactions following a seemingly simple structure that closely resembles English grammar (even if we don’t think of it in that way). Credit goes to the guys at Snowplow who have pioneered the concept of event grammar in the digital event analytics community.

Let’s recap how grammar applies to digital analytics events (as I see it, you might disagree — if you do, let me know in the comments).

The fundamental structure of any digital interaction is:

subject + verb + object + context

A common ecommerce example would be

user added product to basket

Building blocks of an interaction

User

Who actively performs the action (almost always the user). “Almost always” because I consider pageviews to be passive events. The page was shown to the user. More on this later


Object

What the user interacted with directly (we’ll come back to this, it’s really important)


Verb

Action that occured. Together, the verb + object form a mini summary of the interaction (e.g. “added product”). Often, this is enough to “get” what the interaction was.


Context

When, where, how and what else is involved in the interaction or affected by it. Whereas when, where and how clearly indicate context, “what else” typically includes other objects that are indirectly part of the interaction (e.g. added to basket, where basket is an object). In GA land, context covers technology, channel and URL path dimensions.


Traditionally, analytics has been about tracking stuff according to this unwritten event anatomy. The process has mostly been as follows:

Identify the object of the interaction and then:

Track it

Files downloaded, buttons and links clicked, pages seen, etc. Our GA reports are filled with pages and events — stuff that users interacted with directly.


Classify it

Reduce the granularity of data by clustering events and pages into groups and categories to make them easier to “take in” and analyse. This is core task in most GA setups and typically includes making use of eventCategory, eventAction, content groupings and channel groupings.


Now, let’s try to map out the Enhanced Ecommerce events to this well-structured event anatomy. Bear in mind what Google say about Enhanced Ecommerce:

“Enhanced ecommerce … enables the measurement of user interactions with products on ecommerce websites across the user’s shopping experience…”

Enhanced Ecommerce event What the interaction really is
click user clicks on link related to product
detail user shown detail page for product
add user adds product to basket
remove user removes product from basket
checkout user starts checkout for product
checkout_option user adds checkout option to checkout
purchase user completes transaction that includes product
refund user submits form to return product
promo_click user clicks promo related to product

You see, enhanced ecommerce actions don’t map neatly to our event grammar. In some cases the product is the object of the interaction but most of the time it’s part of the context. This is a subtle but significant difference. Enhanced Ecommerce recognises that products hold special status in ecommerce analytics. For the first time, products are treated as first class analytic entities at data collection time (rather than just at analysis time).

From interaction objects to business entities

I used the word entity above. It’s the best term I could think of to describe a thing that’s of particular interest to a business to measure performance and behaviour for.

Entity = a thing that’s of particular interest to the business to measure performance and behaviour for.

In ecommerce you would have a few special ones (aside from user/shopper):

  • The product
  • The product category
  • The transaction
  • The shopping visit aka the checkout (I bundle baskets in here too)
  • The promotion
  • The campaign

These form the very fabric of ecommerce and naturally, we often do analysis at instance-level for these entities. We often want to compare how a product has fared against another. Or whether a promotion was more profitable than another.

The question is, would you really care about a link click or form field if it weren’t directly related to one of these entities? Probably not.

So you see, it’s not the button that users clicked on that counts (the object of interaction) but rather the business entity (a specific instance of it) that click is related to. If you subscribe to that notion, then the conclusion comes naturally — we have to link interactions to the business entities they’re related to at data collection time.

Enhanced Ecommerce demands that we expose data in a way that they can link interactions to products. Maybe we should do it for all business entities.

Let me say it again, we have to link interactions to the business entities they’re related to at data collection time.

This statement marks a significant shift in my approach to designing the analytics data collection model and how I envision using a TMS (in my case Google Tag Manager) and the data layer (both in a conceptual sense as well as from a technical perspective):

Give interactions full richness of detail

Capture the interaction itself with as much richness of detail as it’s feasible and necessary. Don’t get constrained by the 3 measly fields Google Analytics provide – eventCategory, eventAction and eventLabel. Think beyond this rigid structure.


Identify ALL business entities upfront, design their data model and find a way to easily link specific entity instances to interactions whenever and wherever possible via the data layer.


In effect, the anatomy of the interaction becomes:

subject + verb + object + context + entity

This new anatomy describes not only what happened, but what it means with respect to a thing that’s of particular importance to the business.

The new anatomy affects us all

So why go through all that trouble? Because without it, reporting (and therefore analysis) becomes stunted. Have a look at this screenshot, it emulates how the data would look like at interaction-level.

interaction-level-report-example

What we know about each interaction:

clicked to see product detail page

This click is associated with product (a business entity), but not a specific instance of it. If we wanted to see which products have more clicks, we can’t. However, at least we know that users click on product detail links.


refunded product

This refund is associated with a specific product instance (SKU 4353). We can compare refunds between products but not between categories, brands or products with different price points.


added product to basket

This interaction is also associated with a specific product instance (SKU 4353) as well as its entire attribute set. It has the full richness of detail to feature in any slicing and dicing of data we may need to do.


landed on product detail page

Same as above. These last two interactions enable comprehensive product-level comparison (the cornerstone of good analysis). They not only link the interaction to product as a business entity, but to a specific instance of that entity (SKU 4353 and 9474) and to their full attribute set.


You’d be wrong to think this only affects analysts who do event-level analysis. The way Google Analytics processes data relies on this kind of model. Unless you pass the SKU or price of a product at the moment of interaction, how will it know to aggregate it into the reports? (data import is far from ideal, more on that later).

In the next post I’ll explore the concepts of entity and interaction dictionaries and begin to get into the nitty gritty of putting this approach into practice.

Thoughts? Please let me know in the comments.

Leave a Reply

Your email address will not be published. Required fields are marked *