The Lego Data Layer: Entity and Interaction Dictionaries

In the previous post we looked at the basic anatomy of an interaction and how similar it is to English grammar. I argued that (almost) every interaction is related to a thing that is of particular interest to the business. I call this an entity to distinguish it from the object of the interaction (e.g. the link the user clicked on). Read part one if you’ve missed it.

In this post I want to explore the concept of entity dictionary. Have a look at this screenshot which emulates an interaction-level report.


What’s wrong with this picture? Well, the first two interactions are missing all or some of the attributes associated with the product. These sort of “gaps” stunt analysis. The solution? Treat the entire attribute set of the product as a single unit — a dictionary of sorts that always “travels” with the product SKU at data collection time (this is preferable to data import).

What’s an entity dictionary

Entities have one or more attributes (e.g. product has category, brand, cost etc). Some attributes exist regardless of whether anyone ever interacts with them. Other attributes are gained only upon user interaction (e.g. price shown, color chosen, etc).

(Side note: these latter attributes can only be known at interaction time in a personalised environment and therefore cannot be sent via data import).

Collectively, these attributes form the entity dictionary. This can be as short as the entity’s id and name, and nothing else.

What’s an interaction dictionary

Now, we know that every interaction has a well defined anatomy. And we’ve also established that each business entity has a well defined dictionary to support it.

So an interaction that looks like this:


Is enriched by the concept of entity dictionary to become this:


Every entity (user, product and basket/checkout) becomes a standalone, fully described thing and the interaction is simply an interconnected system composed of these entities.

When a different interaction occurs, the dictionary remains attached to the entity it belongs to:


The only difference between these last two interactions is that one is related to promo, the other to product. We simply swap one entity (and dictionary) for the other to obtain an interaction with a different business meaning.

Starting to sound familiar? That’s because this design approach draws heavily from object-oriented design which revolves around a system of interacting objects (also known as classes). This is intentional (but in a natural kind of way).

I believe this will make intuitive sense to:

Business stakeholders

When you focus the discussion on the entities that support the business (rather than the technical stuff or GA lingo), it makes brainstorming and prioritising analytics requirements easier.


Apps and websites already use a similar design model to power the very functionality of the website. This approach will make intuitive sense to developers (thanks go to Yali Sassoon from Snowplow for making this critical point to me).
Not just that, the lego approach relies on reusing the backend design model that developers have already built. There’s less work involved, more robustness and tighter integration with core underlying logic of the business and app.

You see, this “lego” design of the analytics data collection model starts with the business model. And I am refering specifically to the data collection model. We identify the core entities that support the business, we dig deep into the design model of the website (which already embeds those entities neatly) and reuse its components. We adapt them and incorporate them into our data collection model. This results in a data layer that’s completely analytics vendor/tag management system agnostic, fully driven and shaped by the business model.

Composition – your best friend in designing the analytics data collection model

Developers know that code repetition breeds error and inefficiency. That holds true for analytics as well. Instead of re-describing the interaction in full every time we send a hit to Google Analytics or some other vendor, we simply re-use the entity dictionaries we’ve already defined. We’re using the available building blocks and simply “slot” them into the structure of the new interaction. It really is a bit like Lego!

Have a look at these next 2 interactions. The only difference between them is the entity that they’re really about:



But composition goes much deeper than that. It doesn’t only apply when we construct our interactions, it comes in handy when we build the entity dictionaries themselves. For example, product category has a number of attributes of interest like size, name, level, etc. It also has a number of children products. Instead of re-declaring the dictionary for each product, we can simply call on our ready-made product dictionary and “slot” it in:


Using composition means that when a new product attribute is added to the analytics requirements (e.g. date added to site), this “slotted in” design pattern allows that attribute to automatically “travel” with the product, trickling down into the relevant entities and interactions that it’s a part of. This produces a system that’s robust and easy to build upon in future.

This is already what developers do! They just rarely do it with analytics or if they do, it’s often a “patch-up” kind of job (probably because they are never involved in the early design phase).

Enough with theory. In my next post I’ll talk about how to get this stuff in place, a step-by-step process that covers design, process and GTM instrumentation.

Thoughts? Please let me know in the comments.