Since it was launched, Google Analytics (GA) has constantly surprised us with innovative new functions, several of which have forced even the most established enterprise tools to follow suit. Take, for example, ad-hoc segmentation: this indispensable tool for analysts took other providers years to replicate in a comparable form. Webtrends launched its ‘Webtrends Explore’ only this year to fully enter the world of fast ad-hoc segmentation – but now it even outdoes the segmentation options offered by GA.
Why should I pay?
Given GA’s overwhelming popularity and great functionality, you could be forgiven for asking why any company would still pay for an analytics tool. Years ago, when I first encountered ‘enterprise tools’, I asked myself the same thing. Even though in terms of features the differences are increasingly blurred, there are still a few things that you get only from fee-based tools. And the more I work with these fee-based tools, the more I see that these things make all the difference.
It’s not about data protection
Instead of adding my voice to the chorus of GA cheerleaders, I’d like to break ranks and point out what I think are the biggest flaws in the free version of GA. And by flaws, I’m not referring to the endless data protection debate surrounding Google and the frequent accusation that with GA, you “pay with your data”. Even though that’s the main reason why GA doesn’t make the shortlist at many companies, I want to discuss the things that bother me most as an analyst about daily work with Google Analytics.
Analytics tools eat up huge amounts of processing power. Despite its unparalleled server capacity, even Google has to use sampling to prevent overloading its servers and ensure that users do not have to wait too long for their GA reports.
Sampling means: instead of reading every line, I check only every third, fourth, 10th or even every 100th or 1,000th line. This works brilliantly if you’re conducting social research. If you want to find out what the Swiss think about the EU, it’s generally sufficient to survey 0.001% – about 1,000 Swiss citizens – in order to establish (with a very low margin of error) what percentage of the population wish to remain independent. However, if you would like to know the opinion of a small segment of the population – for example, Swiss marketing professionals within this random sample of 1,000 citizens – then the sample is too low to produce a reliable result, as it would not contain enough marketing professionals.
The advantage of an analytics tool is that it always ‘surveys’ or tracks the entire population; i.e. all visitors to my website (or in this example, all Swiss citizens). Thus, every marketing professional in the Swiss population is represented and you could in theory produce representative assertions of their opinion. But GA does not take all the Swiss marketing professionals from the overall sample and of those that it does select, it ‘asks’ only every 10th person. No, GA speeds through the entire Swiss population, looking at perhaps every 100th person, and if they happen to be a marketing professional, they are included in the segment to be surveyed.
Here’s a classic example: An editor is given the task of tailoring content to better suit the needs of the user. To do this, the editor needs the on-site search terms from the last six months that had the highest search exit rate (i.e. visits that end on the search results page, probably because nothing useful was found). The data must come from the registered users segment, which at 15% is quite substantial, and include only search terms that were entered at least 20 times. The website has about 500,000 visits per month and generated the following statistics over the last six months:
Avid GA users will be familiar with this scenario: three identical reports, three different sampling rates, three completely different results. With GA’s default ‘average’ standard sample rate of 5% used here, even just the number of searches indicates that the calculation is very rough indeed (bizarrely, all terms have been searched for either 57 or 38 times). If you turn the sampling up to three quarters of the maximum sampling rate, the report suddenly looks very different: ‘geld’ is suddenly ranked first, followed by ‘ji da’. If you then switch to GA’s top sampling rate of about 10.3%, ‘geld’ and ‘ji da’ disappear completely and ‘pumps’ and ‘jura’ suddenly pop up.
The more traffic you have and the more complex the segment, the more GA uses sampling – and this increases the likelihood of the data becoming so skewed that it is ultimately worthless. Given this, owners of high-traffic websites might like to think twice about relying on GA.
2. Customised support
No other tool has as much user-generated documentation as GA, which provides a significant advantage. However, plenty of unique problems still exist that can’t be solved simply with copy & paste. And blogposts are sometimes just full of rubbish, because few bloggers actually understand how GA really works. Apart from Google’s own official documentation, all lay knowledge about GA (including my own) is based purely on observations and assumptions about how GA operates.
For me, it’s important to have a provider that I can email or call when I experience problems with its tool. A provider that can look ‘inside’ the tool and save me days or weeks trying to work out a solution – or not finding a solution at all, because one for my particular problem is currently not available. Information about the software’s limits is particularly beneficial for analysts, as it saves them hours of desperate search looking for something they may have missed.
3. Raw data
“Tool A says 400, Tool B only 300 – how is that possible?” This kind of customer enquiry is enough to send chills down an analyst’s spine. After all, a lot can go wrong from the visitor’s browser to the report. Typically, the problem is caused by one of the following issues: 1) tracking is not running properly, 2) data is not gathered/stored correctly, 3) data is not processed properly, 4) reports are not configured correctly, 5) users are misinterpreted.
Some of these problems can be solved quickly using professional debugging tools (diagnostics tools for locating errors). With 2), a feature from fee-based tools is enormously helpful: raw data; i.e. log files or direct data extracts from those log files.
If I can see into the log files, then I can determine whether data has been gathered or not. Thus, I can exclude one of the most common sources of errors. If I can’t safely exclude this potential bug, all subsequent testing will be marred by this fundamental uncertainty. Most tools filter everything thoroughly in their log files (e.g. bot traffic) before they process the data to generate reports. With Google Analytics, you’re always in the dark: you don’t know exactly how or what is being stored.
What’s more, if I enrich the log files with additional data, I can also add past data to my reports. Or, with a tool like Webtrends, I can process the log files again according to different criteria; for example, if I forgot a profile filter and now all the internal traffic has been accidentally added to my live profile. If that happened to me while using GA, my data would be irreversibly damaged.
For me, sampling, support and raw data are absolutely essential. So, perhaps unsurprisingly, I believe it’s these three main things that differentiate the relatively new Google Analytics Premium from the freeware alternatives. However, this version is so expensive (generally with a six-figure price tag) that for small and medium-sized businesses it’s usually not worth the investment. It may be worthwhile instead to take a look at a completely different tool.