The Lego Data Layer: Managing Event and Entity Dictionaries

My last post gave a simplified overview of the developer process behind this approach. I showed how the dev teams would create a set of helper functions to generate these dictionaries server-side. These functions would then be called whenever a new interaction is “packaged” up. The resulting interaction dictionary is then passed off as a whole to Google Tag Manager for the next stage in the process.

But now it’s time to go back to the start, the actual process for fleshing out and managing these dictionaries before any code is written.

Warning: Seriously unconventional analytics work ahead (done for ecommerce but concepts apply to other business models). From here onwards it can get pretty geeky at times. I’m using this series of posts as a way document the process for myself.

1. Identify interactions and associated business entities

Remember the anatomy of an interaction

Every interaction follows (more or less) this structure:
subject + verb + object + context + entity


Document ALL interactions fully then prioritise them

Use wireframes or dev/live website walkthroughs to document all interactions of interest. Must seek to document the interaction fully and that involves identifying all of the things that are directly or indirectly part of it or affected by it. Write it down in plain English, being as precise and comprehensive as possible. Don’t forget to document trigger elements (for client-side events). Prioritise them (I use MoSCoW tags for this).


Example:

  • user clicks on carousel image in product category page
  • user clicks on thumbnail image for product in snippet on product category page
Extract nouns

Go through the interactions and make a list of all the nouns. Identify nouns that keep coming up or are somehow alike. Some are obvious (like product), others not so much (carousel image and thumbnail image are alike — they’re both calls to action).


First, identify business entities

Of the nouns found, which ones are fundamental to the business model? These are the first-class entities we’ll be building dictionaries for. Which nouns are important but not critical? If in doubt, ask yourself, will you ever need to compare one instance against another to answer a question? Does the noun support a first-class entity? If answer is yes, then that noun is also an entity (but a second class one).


Example (ecommerce):

  • First class entities: product, product category, promotion, transaction, checkout, etc — (these are critical).
  • Second class entities: blog post, blog category, filter, checkout stage, checkout option, search result, search result collection etc — (these not as much OR they support 1st class business entities).
Then, identify other reoccuring nouns

All of the other nouns that keep coming up act as simple interaction objects or they form the interaction context. We’ll build dictionaries for these too (even though we aknowledge that they’re not business entities). They’re most likely related somehow to the business entities we care about the most.


Example:

content tab, content item, call to action, notification, help item, form, form field, etc — These might be part of the context, they might be the interaction trigger OR might even act as generic “base” objects that entities are built upon (e.g. a promo is also a call to action whilst being so much more than that).

Identify entity – interaction associations

Go over interactions again and see which ones are actually about a business entity (regardless of the interaction object). Explicitly document the entity in a separate column.


Summarise the interaction and its elements

Summary should include: interaction name (choose wisely, this needs to be unique and will show up in reports; needs to be self-explanatory yet brief), interaction category (a roll-up term for multiple related interactions; will be used in reports), subject, object, business entity it directly relates to, and last but not least, context objects and entities.


Example:

interaction-summary-google-sheets
This interaction is unambigously directly related to a checkout stage. But I’d argue that at a business level, what we really care about is the checkout that the checkout stage belongs to.

Generate the interaction dictionaries

The script automatically uses the elements of the interaction to produce the JSON spec developers will be working from


Iterate

It takes several passes to boil richly described interactions down into components. There will be undiscovered entities and objects with every pass, interactions that involve some repetition or interactions where the business entity it directly relates to isn’t clear. The design becomes clearer with every pass.


2. Flesh out dictionaries for entities

The entity and interaction dictionaries will need to be tightly integrated with Google Tag Manager (my TMS of choice) and its building block is of course, JSON.

I therefore needed something like this:

  • User-friendly way for analysts to develop these dictionaries collaboratively, make additions and changes without getting lost in code.
  • Automatically convert them into precise JSON specifications to hand over to developers.
  • Commit-style version control, highlighting changes (similar to GitHub) and email notification on changes.
  • Act as the single source of truth for the data layer instrumentation. Can’t stress how critical that is for this kind of unconventional analytics setup.

Enter Google Sheets with Google Apps Scripts, the familiar environment of a spreadsheet turbocharged by the amazing functionality of Google Apps Scripts.

Meet the entity dictionary workhorse

The product dictionary worksheet

product-dictionary-worksheet

The product collection dictionary worksheet

product-collection-dictionary-worksheet

The checkout option dictionary worksheet

checkout-option-dictionary-worksheet

What are we looking at?

This is where all the bulk of the work happens. Attributes are added in following discussions with developers and business stakeholders and grouped logically.

We then assign priority tags and decide which attributes will make it into the current work package. We can add also jot down candidates for future cycles as well as think ahead to analysis time and write down metrics and dimensions which will prove relevant. We also mark each attribute with client-side or server-side tags, depending on where the push to GTM will come from.

The dictionary is a tree

Highlighted in red is a branch. Related or similar attributes are grouped together under the same tree branch. The last cell on the row (green cell) is the value of the attribute. The branch nodes leading up to it are for classification only (inspired by mind-mapping software and JSON structure). We’ll grab these values (the green cells) from the backend whereever possible.


Flat or nested?

Some will disagree saying that a flat tree (and therefore JSON) structure is preferable (e.g. brand_name and brand_variation instead of the current brand branch). However, I find longer branches with short node names easier to follow (reader friendliness comes first!). More importantly though, dot notation allows me to pass an entire branch to another Javascript object without targeting each attribute explicitly. That’s a core requirement of reusable, object-oriented design patterns.


Lists of children objects

Some entities (like product collection) are composed of children objects. In this case, products. The :list indicates that from there onwards, the contents of that branch will be a list of product nodes. For each product, its dictionary is simply “slotted” in.


Lists, NOT arrays

Debated this a while and opted against arrays when children are business entities. Will revisit why in a future post but sufice to say for now that lists of JSON nodes allow you to access a specific node (and dictionary within) directly via dot notation like so product_collection.products.product_345 rather than looping an array. We’ll use this for fast lookups. Neat, huh?


Explicitly document if an object or entity is in reality related to another, more important business entity. It is often enough to document its type and id (can substitute id for name if needed).


Informational tags for management

Each tree node has a bracket with some text. These tidbits of text are the management language of the dictionary. They pack a lot of information and are used to manage the work/requirements (more on that below). The script uses “clues” in this language to automatically produce different summaries and specifications.


Worksheet conventions:

Priority tags (MoSCoW)

First single letter after opening bracket — (M, …) __ indicates priority status according to MoSCoW method. By default, only attributes marked M (Must have) make it into the spec passed to developers. This allows the analyst and business stakeholders to continue working on the dictionary without affecting the current work package.


“Moment of use” tags

Following MoSCoW flag is a tidbit of text that tells us whether that attribute will need to be passed to the data layer server-side (SS), client-side (CS) or not passed at all but inferred later (Analysis). This allows the analyst the explore requirements along the entire analytics continuum, from data collection to analysis, all from within a single repository.


Sheet name

Each worksheet is named after its entity (e.g. product_dict). Entity name is also is the first cell — makes it clear what dictionary I’m editing.


No blank spaces in tree node names

The text for each tree node (until the opening bracket) will be used as a JSON key. Avoid blank spaces. Underscore vs camelCase is merely a question of personal preference.


Colour coding

Tree nodes are colour coded according to priority. Make highest priority ones to stand out whilst avoiding visual noise from the lowest priority ones.


Every element in this dictionary is carefully thought through so that relevant documentation can be generated with as little manual work (which increases risk of human error) as possible.

TODO:

  • Add done tags for completed nodes and produce summary for completed parts of the dictionary.

3. Automatically generate entity dictionary documentation

I’ve written some code in Google Apps Scripts to use the individual dictionary worksheets to automatically generate a summary sheet with the following:

Dictionary changes

A list of attributes added / removed from dictionary compared to previous version (GitHub style). These are documented using dot notation as this is easier to read. Only attributes marked M — relevant to current work package — are included.


dictionary-changes-github-diff

Human-friendly dictionary

It can get tiresome to try and read JSONs. I decided to strip out the noise and leave a barebones version that I can actually read. Hierarchical structure is preserved.


dictionary-reader-friendly

Precise JSON specification

The spec developers will work from to generate dictionaries server-side with helpful commentary as to what each attribute represents.


precise-json-spec

HTML5 markup requirements

For client-side interaction tracking additional markup requirements become inevitable. The script screens all attributes and those marked with CS data-* (and only them) get converted into HTML5 dataset markup.


html5-markup-dataset

Current version number and last updated

Basic elements of version control are: entity name, current version number and last modified date.


google-sheets-dictionary-version-control

TODO:

  • Add commit-control. Prompt for commit comment when changes have been made.
  • Add email notification to relevant parties when changes are made.

A static document would not work for this. Simo Ahava recently wrote in his Data Layer blog post:

“it’s extremely important to treat the Data Layer as a living, agile model, not a stagnated, monolithic, singular entity.””

I fully agree. This kind of tool (and the rest of the lego-inspired design model) allows you to start with the shortest, leanest dictionaries you need and then expand them over time as and when requirements demand it.

Thoughts? Please let me know in the comments.

Leave a Reply

Your email address will not be published. Required fields are marked *