The Lego Data Layer: How Does it Work in Practice

In previous posts I argued that every interaction has a structure that applies almost universally. Of its components, the most important ones are the object of the interaction (what the user interacted with directly) and the business entity it relates to (the thing that’s of particular interest to the business to measure). These can be one and the same but often they are not.

If interactions are made up of components, then it makes sense to think of interactions as a Lego set. Take out one brick and replace it with another to come up with something entirely different.

Example

  • user clicks on link related to product
  • user clicks on link related to promo

I also introduced the concept of entity dictionaries, a full and rich attribute set that defines every business entity (think product for ecommerce). This dictionary “travels” with the entity it belongs to whenever you slot that entity into a new interaction. It makes the design leaner, more efficient and easier to maintain.

But how does this actually work in practice? We know that JSON is the building block of the data layer from a technical point of view. We therefore need to map every interaction and its components to a JSON equivalent. This is how I envision an interaction “recipe” (also known as a dictionary) in JSON format:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
"added_product_to_basket": {
    "action": {
        "category": "shopping",
        "timestamp": "1410962241"
    },
    "user": <user dict>,
    "object": {
        "type": "product",
        "dict": <product dict>
    },
    "context": {
        "notification": <notification dict>
        "checkout": <checkout dict>
    }
}

Looks a bit messy and hard to read doesn’t it? Let’s strip out the noise and create a human-friendly version of it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
added_product_to_basket: 
    action: 
        category: shopping
        timestamp: 1410962241
    user: <user dict>
    object: 
        type: product
        dict: <product dict>
    context: 
        notification: <notification dict>
        checkout: <checkout dict>

This “recipe” describes the interaction fully. The user added product 354 to basket at a specific moment in time and was shown a success message (the notification). At the time of the interaction, a checkout had already been initiated. Rich details about the user, the product added to basket, the notification seen and the existing checkout are all available because the dictionaries are slotted into this structure.

With the proper system and documentation in place, these “recipes” go through a carefully controlled process of design, validation, approval and version control (the subject of the next post).

Here’s another one:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
applied_promo_to_product: 
    user: <user dict>
    action: 
        category: shopping
        timestamp: 1410962241
    object: 
        type: promo
        dict: <promo dict>
    context: 
        notification: <notification dict>
        checkout: <checkout dict>

Same interaction as above but with promo dictionary expanded:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
applied_promo_to_product: 
    user: <user dict>
    action: 
        category: shopping
        timestamp: 1410962241
    object: 
        type: promo
        dict: 
            id: SummerSale45
            amount_percent: 10
            creative_id: 2
            main_entity: 
                type: product
                id: 450
    context: 
        notification: <notification dict>
        checkout: <checkout dict>

NOTE: I want to keep my interaction recipes simple. Unless a main_entity is explicitly declared (in this case, the promo advertises a specific product) then I’ll assume that the object and the business entity are one and the same.

When this kind of recipe is handed over to the server by the developers, it grabs the relevant dictionaries and expands them in the appropriate slots. This creates a complete JSON object that captures the richness of that interaction. Now, imagine how convoluted and prone to breakage this JSON would be if it needed to be constructed manually, attribute by attribute whenever an interaction involved promo. Nightmare!

“Listening” for interaction success

Tracking involves “listening” (client-side or server-side) for confirmation that an interaction has occurred. As soon as that confirmation becomes available, we transfer our interaction dictionary to GTM’s dataLayer (or whatever TMS you have in place). For interactions captured server-side, the process is shown below (simplified):

The developer process for passing interaction dictionaries to Google Tag Manager

Once you’ve finalised documentation for each dictionary (both entities and interaction), the developer process is (loosely) as follows:

1. Create helper functions to generate entity dictionaries server-side

clever-code-helpers

In pseudo code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
function helper_product_dict {
    // retrieve product-specific attributes according to spec
    // from backend (if they are available in the current context)

    // if they do not exist, set the attributes to 
    // fallback values (an empty string)

    // populate the approved JSON dictionary structure with
    // the available product attributes or their default values

    return product_dictionary, product_id 
}

Rinse and repeat for all business entities.

This is a bit of code magic that, when called, produces a dictionary for whatever product exists in the current context. This dictionary then becomes available as a stand-alone building block. If this were Lego, it’s akin to handing someone else a bunch of Lego blocks and asking them to assemble them into a fully formed car. Then, when your scene calls for a car, you just grab it from the box.

The data layer is like a box of ready-made Lego characters (but for analytics).

NOTE: For this to work, it assumes that you’ve already done the groundwork to create an approved dictionary structure for product (we’ll talk about this in the next post).

2. “Call” the helper function immediately after the server records the interaction

interaction-assembly-server-gtm

In pseudo code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// server records that 'add to basket' interaction has occured

// now, make dictionaries relevant to interaction available as variables
user_dict = helper_user_dict()
product_dict = helper_product_dict() 
notification_dict = helper_notification_dict()
checkout_dict = helper_checkout_dict()

// and then, assemble them according to your "recipe" and transfer it to GTM `dataLayer`
print(
    dataLayer.push({
        "event": added_product_to_basket,
        "added_product_to_basket": {
            "action": {
                "category": "shopping",
                "timestamp": "1410962241"
            },
            "user": user_dict,
            "object": {
                "type": "product",
                "dict": product_dict
            },
            "context": {
                "notification": notification_dict
                "checkout": checkout_dict
            }
        }
    });
)

Rinse and repeat for all other interactions.

A couple of notes:

Performance concerns

Every additional call to the backend will put a strain on the server. However, these dictionaries are usually composed of values which are already required by the core functionality of the page. We’re simply “packaging” them up in a dedicated structure (e.g. all attributes in product dictionary are already in use on the product detail page). Regardless, first step is to quantify and optimise the additional strain.


Keep a repo of dictionaries and “hook” them in when required

Instead of littering the website’s code with these dictionaries, keep them and the helper functions all in a single place. Then grab only what is required. Many modern platforms support “hooks” to which this approach lends itself well.


Granular setups can have leaner dictionaries

Rich interaction dictionaries are particularly useful for traditional GA setups where there is no user/event-level tracking. It plugs some worrying gaps in the data. For event-level setups, though, much of the context can be inferred after data collection (probably in the earliest stages of data exploration). Whilst we preserve the general “recipe” for each interaction, its building blocks can be a lot leaner.


This is how it works in a nutshell. We “package” the elements that are relevant to any given interaction on the fly and then pass that entire package as a JSON object to Google Tag Manager. There are many subtleties to the process, some of which I’ll cover in future posts.

Thoughts? Please let me know in the comments.

Leave a Reply

Your email address will not be published. Required fields are marked *