Not Provided: How to Use Behavioural Analytics to Infer Keywords

I spent a lot of time in the past doing SEO whilst running my own websites and Ive always been fascinated by Visitor Intent. I used to spend hours sifting through keywords and working out what people really wanted from my website.

As an analytics consultant, I now employ more sophisticated classification models designed to group large number of keywords into intent “buckets” and align these with the customer journey (the topic of another blog post). But Not Provided has made that virtually impossible for Google organic traffic.

Nothing can substitute the loss of actual data but there are workarounds. One of them is behavioural analytics — using actual visitor behaviour to infer what the keyword used might be.

Behavioural predictor for keywords: what people do first on a website

Search behaviour is a highly active, task-oriented process. The visitors mindset is rarely on browsing, its on finding and doing something.

First Action as an extension of search behaviour

The first thing that people do on a landing page is closely aligned with what they are there to do. But so are keyphrases. So, can we can use the known behaviour we do have data on (first action on site) to infer behaviour we dont know, but which is closely related (keyword used)?

Yes, we can. And the beauty of it is that much of this behaviour is already being tracked in Google Analytics, we just never thought of looking at it that way.

In some cases, the first behaviour is even more accurate at predicting visitor intent than keyphrases ever were. Whenever I classified keywords by intent I inevitably ended up with an important group of keywords which were so broad and vague that it was impossible to work out what peoples intent was. But when you take their first action into account as well, the meaning of someone searching for a broad term like your brand becomes much clearer.

That’s because what someone does first on a site represents a choice which is:

  • Made actively and consciously. In the process, other possible choices are rejected.
  • Aligned with the task someone is in the process of completing.

Inferring keywords from known visitor behaviour (some examples)

Lets take a few examples to see how this might work in practice. There’s no rocket science involved. The trick is to learn to look for the underlying meaning behind interactions such as pages seen, buttons clicked, etc.

Example 1. Travel site, destination landing page


Sometimes the “first behaviour” is actually a set of micro-interactions which have meaning only in the whole of the stream. Many sites have an availability or search form with multiple filters and options. The sum of those choices, made as soon as someone lands on the page, represent a strong behavioural clue as to what their search query was.

  • First action – Visitor fills availability form
  • What to track – A concatenated string of all values in chosen options:
    “santorini oia september 2013 2 adults self catering”
  • Inferred keyword (landing page + first action) – “greece holidays santorini oia september 2013 2 adults self catering”. In other words, “self catering holidays for couples”.


One thing we often forget is that when people land on a page, the first thing that they do is scan the page for the thing that closest matches their mental context. If someone chooses “Regions in Greece” from all of the available options it means that that’s their initial intent and interest.

  • First action – Visitor clicks on Regions
  • What to track – Anchor text of link clicked: “regions in greece”
  • Inferred keyword (landing page + first action) – “greece holidays regions”. In other words, “where to go on holiday in greece”


Behavioural scenarios are not always so clear cut. Banners and calls to action can sometimes distract someone from their original track. But more often than not, their first choice will still be closely aligned with the behaviour that started in search.

It’s unlikely that someone who searches for “luxury holidays in Greece” will choose to explore the budget holidays section as their first activity on site. You may not be able to infer the exact keyword used, but you may well be able to infer the market segment that someone belongs to. And that’s hugely useful.

  • First action – Visitor clicks on “Luxury holidays” banner
  • What to track – Custom attribute of banner image (or even the image file name, if relevant, which you can clean up later): “greece luxury deals”
  • Inferred keyword (landing page + first action) – “greece holidays luxury”

Example 2. B2B service site, homepage


Pages like the Homepage fulfil multiple functions and cater to different audiences. Not provided makes these sort of pages difficult to analyse. But here’s the paradox. Often, these pages are the easiest ones to infer “not provided” keywords for because they have so many self-selection mechanisms baked right into them. By choosing one link over another, people raise their hand and tell you what topic, interest, and therefore likely keyword they used to get there.

  • First action – Visitor clicks “More” link leading to Reseller hosting
  • What to track – Page path of link visitor clicks on: “reseller hosting”
  • Inferred keyword (landing page + first action) – “reseller hosting”


In some cases, the first thing someone does on a landing page can be highly specific and therefore less open to interpretation. In those cases you can infer a “long tail” term with a great degree of confidence.

  • First action – Visitor clicks “How do I transfer a domain into Clook”
  • What to track – Anchor text of link clicked on: “How do I transfer a domain into Clook”
  • Inferred keyword (landing page + first action) – “clook support how do I transfer a domain”

Example 3. Ecommerce site, product category landing page


Even if people switch brand of products during their visit, the one they choose first is the one they probably searched for. Even if you knew that they searched for “washing machines”, their first action (e.g. “Beko”) tells you that what they really meant was “Beko washing machines”.

  • First action – Visitor clicks on Beko logo
  • What to track – Page path of link clicked: “Washing machines beko”
  • Inferred keyword (landing page + first action) – “beko washing machine”


A price-related filter applied at the beginning of the visit can tell you a great deal about the likely commercial nuances in the keywords used by someone.

  • First action – Visitor clicks “under £350″ link
  • What to track – Anchor text link clicked: “under £350″
  • Inferred keyword (landing page + first action) – “washing machine under £350″. In other words, “cheap washing machine”.

Example 5. Ticketing site, event landing page


The landing page will always give you the general topic (i.e. “head term”) someone has searched for. But take into the account the active choices people make on that landing page and you can refine that head term into more specific sub-topics.

  • First action – Visitor clicks “238 fan reviews”
  • What to track – Anchor text of link clicked “238 fan reviews”
  • Inferred keyword (landing page + first action) – “charlie and the chocolate factory review”


  • First action – Visitor clicks link in calendar
  • What to track – Page path of link clicked “theatre royal drury lane london”
  • Inferred keyword (landing page + first action) – “charlie and the chocolate factory theatre royal drury lane london”. In other words, “charlie and the chocolate factory tickets” (and not reviews)

Caveats of using first action for secure search analysis

No long tail

First behaviour gives more clarity around what the search query might have been but it will never be as precise, varied or rich in meaning as long tail keyphrases. That richness is inevitably lost.

Dependent on available choices

The first action people take on a landing page is limited to what actions are available to take. If there aren’t any or very few options for visitors to actively choose from, then you might be clutching at straws. Sometimes it might sense to bake in some self-selection mechanisms (this can also improve scent and general user experience).

People choose the most appropriate link for their task based on what they see. If links or buttons are poorly labeled (i.e. click here) then that can interfere with people’s decision process (“which link looks more likely to do what I am here to do”).

Dominant calls to action can skew intent

Sometimes a single call to action that dominates a landing page is so effective that it masks the visitor’s original intent (which is what we’re concerned with). Many people may click on that dominant call to action even if they have no intention of following through.
That “detour” from their original intent muddles the data. So, in some cases, the first few actions (rather than the first alone) give more insight into what people came looking for originally, and therefore the keywords they are likely to have searched for.

Brand searches difficult to estimate

In some cases its reasonably straightforward to use the first action to work out if the visitor searched for the brand name (for example, when the first action is clicking through to help, contact page, branch locator, etc). But these cases are in the minority. There are options though.
You can add other behavioural analytics into the mix such the speed with which people move through the site (brand awareness implies familiarity with the layout and features of the site — unless first time on the site). I’ll explore these in a future post as they are more complex (and I’m still learning about them myself).

Based on assumptions and inferences

There’s no doubt that this technique is prone to error. You have to start with sensible assumptions (but assumptions nonetheless) about what the first action tells you about someone. You then need to validate those assumptions with data.
But the process has clear benefits. It puts you in your visitor’s shoes, forcing you to work out what their mental context might be. And by its very nature, these assumptions require that you always refer back to the business model. And that’s always a good thing.
If you have access to historical data pre-dating 100% not provided, you can cross reference many of these “first actions” with actual keyword data. You can work out what the “first action” was for important head terms or brand keywords and use that data to validate or clarify assumptions when the intent behind “first action” is too ambiguous.

How to track first actions in Google Analytics

Second Page Visited

As I mentioned earlier, some of the first behaviour is tracked in GA by default if the first action taken is a pageview (visitor lands on page A; the first thing they do they is click through to go to page B; page B counts as the first action).


Google Analytics tracks this under the ga:secondPagePath dimension.

If the first action is a client-side interaction (i.e. click on a button) which is tracked as an event, then you need some custom GA tracking. While you can see all the events associated with a page, there is no way of telling whether they occurred on a landing page or, indeed, if they were the first action taken in that session.

One workaround is to fire ALL interactions as pageviews in order to leverage the functionality of the Second Page Visited dimension. You would have to use a separate profile or property for this. I’d also use a profile filter to strip out any non-letters or digits and make the “inferred keyword” easier to read.

Sequential segmentation

You could try the new sequential segmentation feature although it doesnt scale for this sort of stuff. You need the first behaviours recorded as a dimension so that you can manipulate the data and cluster it for analysis and the sequential segmentation is simply not suited for that.

Event tracking

Event tracking would be the first and obvious tracking mechanism for these “substitute keywords”. You could differentiate events fired on landing pages vs other pages like this (untested):

  1. Determine whether the page is a landing page based on document.referrer being Google organic
  2. Add a custom parameter to the event category to identify events occurring on a landing page (e.g. “Regions in greece – np”.

However, this gives you ALL of the interactions occurring on that landing page, regardless of order. Remember, we are only concerned with the first action, not just any action taken on the landing page (which could be the 1st or 7th). Only the first one. This is important.

At the moment, the secondPagePath dimension in Google Analytics seems the most readily accessible solution for inferring keywords based on actual visitor behaviour.

Thoughts? Comments? Let me know below