Monday, July 21, 2008

Google Analytics, Sampled Data, Entrance Keywords Report Problem

Has anyone ever noticed “This report is based on sampled data” message in Google Analytics? How accurate is this sampled data? Lets investigate...

So, here is what I need to find out:

Basically, I need to find out traffic stats for a given page on our site. Also, I want to have the traffic broken down by keywords.


For this URL, I need to see how many people came to this page in March and what keywords they used. Then I need to compare this data to April.

Why I need this?

We are doing the title change across our site and need to see how it is reflected on traffic and particularly on how the keywords have changed (which attracted this traffic).


So, I went to Analytics and decided to check the accuracy of the report called Entrance Keywords (as this is the report that meant to show exactly what I need). So, I generated this report for above URL for the dates from 1st-4th July. The generated report was based on sampled data, and it showed 5 pageviews for “family owned hotel in nassau” – this is also the only keyword that it showed.

So, I decided to check the accuracy of this report by generating another report with only dates being different, from 2nd to 3rd of July. Since the first report was generated for the dates of 1st to 4th July, this new report is a subset of the first report and thus I was expecting to see same keyword “family owned hotel in Nassau” with less pageviews.

However, the report that was generated was absolutely different from my expectations and contained more keywords and more pageviews. This new report however did not say that it was based on sampled data, and thus it is suppose to contain more precise statistics. So, in other words this means that this latest report is correct and the first one is misleading as it is missing many pageviews and keywords.

So, the conclusion is that the current Google Analytics sampling techniques provide results very far from accurate. In fact if I wouldn’t check this, I would most likely be misled by this data in my business decisions.

Please leave your feedback in comments if you had similar experience or can shed more light onto this issue.

1 comment:

dave said...

i think it is something that you have to accept nothing is perfect. There is always slight changes when you are trying to get the data for Today in your reports.

I agree that the keyword issue is something strange but you will find a number of metrics will slightly change. This can also occur with adwords campaigns. That is why you use more than one web analytics tool.