ghost data Google Analytics

What every small business should know about Ghost Traffic in Google Analytics

Why should small business even care about Ghost Traffic in Google Analytics …let alone know what it is; how it is affecting your business and how to get rid of it?

My purpose in this blog is to explain this phenomenon in an accessible way as possible, so that small business owners can be better informed and to know how to take action if needed.

If you don’t have a website as an important business tool, then you can take the early mark and get back to work.

However, if you have a business website and you are relying on traffic to that website for your leads, enquiries or sales …you should read on. Especially so if you are paying someone to generate traffic to your website.

I have written previously about a case study in which the Google Analytics presented to a client by a multinational digital agency were completely corrupted. The figures looked great because they indicated impressive traffic to the website. The data also looked genuine because it was an actual report generated directly from Google Analytics. What could possibly be wrong?

Unfortunately, many website owners (both large companies and small businesses alike) actually believe the reports either because they want to believe them and to be impressed, or they simply don’t know how and why they should question the data.

The fact reports are from Google Analytics is no reason to blindly accept the data. If you are relying on that website traffic you must interrogate the data. This blog will show you how.


If you can’t measure it you can’t improve it.

Peter Drucker
Google Analytics logo

What is Google Analytics?

Google Analytics is the most widely used software to measure the traffic to websites. It measures many parameters such as page views, time spent on each page, geographic location of the visitor, events such as clicks, etc, etc. This data can then be used to tweak aspects of a website such as Search Engine Optimisation and to measure the effectiveness of those changes.

Every website registered with GA receives a tracking code which is a small javascript snippet installed in the HEAD of each page of the site. That javascript is triggered to send data to Google when events happen on the page, eg a visitor clicks on a link in the Google Search results and is directed to the page.

Those data are collected in Google Analytics and can be manipulated to produce a wide variety of reports as required. In this blog, I will am referring exclusively to Universal Analytics rather than GA4 which is still in beta mode.

What is Ghost Traffic?

Ghost Traffic is fake traffic which can be generated by anyone knowing the unique identifier contained in the tracking code for a website. This is VERY simple to obtain.

A script can then be run on a computer to send fake data directly to the GA property …and it will look just like data generated by living breathing humans actually clicking onto a page and reading the content.

This video demonstrates data being written directly to a GA property using a simple Python script. The fake hits in this demo are attributed as referrals from Instagram, Facebook and Twitter. Thousands of hits recorded in GA …and yet not a single human visit to the site.

This demo video was created by Dr Augustine Fou and published in this article in his LinkedIn feed. The whole article discusses the need for better cybersecurity in GA and demonstrates the Fou Analytics alternative which I will discuss later in this blog.

So, the analytics and attribution numbers you use to make business decisions are probably not as accurate as you assumed they were.

Dr Augustine Fou

How to get true data from Google Analytics – a real life example

Can I “block” fake traffic?

Unfortunately, it simply isn’t possible to prevent anyone from writing ghost traffic to your GA property. As Dr Fou discusses in the article referred to above, Google has addressed the need for increased cybersecurity in the next version, GA4, which is still in beta. While we continue to use Universal Analytics, we have to live with the fact that we cannot block the deliberate corrupting of data with Ghost Traffic.

That leaves us with two alternatives:

  • we can filter out Ghost Traffic to obtain a measure of true traffic volumes, or
  • we can use a different analytics platform which has inbuilt cybersecurity.

I am now going to illustrate each of these approaches by comparing the analytics from one day of traffic to the same site as an example. The site is the same as discussed in the earlier blog about fake traffic – Impact Panel Works and we will examine the traffic of 3rd February, 2022.

impossible to block Ghost Traffic
It is impossible to block Ghost Traffic in Google Analytics

Method 1 – filter out Ghost Traffic from Google Analytics

Starting point – raw data

The image above displays one full day of traffic to the site in it’s “raw data” view. This view shows everything written to the GA property for that 24-hour period – including real traffic, various bots (good and bad) and ghost traffic. We’ll focus on two numbers: the 113 users and the 332 pageviews along with the Geo Location data (countries) shown in the table to the right.

You’ll notice traffic reported from nine overseas countries as well as local traffic from Australia. These are the same nine countries mentioned in the earlier blog I wrote about fake traffic.

GA-custom-dimension
GA custom dimension setup

To filter out the fake, we must first “tag” the real

As we have no access to, or control over, the ghost traffic, the only way to filter it out of the analytics is to “tag” or “mark” the real traffic as it hits the site and to then create a view in GA that only includes this real traffic.

Detailed instructions on one method of doing this can be found in this article by Joey Strawn of Industrial Marketer. The method uses GA’s built-in “Custom Dimensions” to mark the genuine visitors. As each visitor hits the site, an extra piece of data is added to the data string sent to the Google URL. In this case, we set up a custom dimension called “GHOST” and record a value of “false” as these hits are NOT ghost traffic.

Create a new GA “view”

In Google Analytics we are able to create a number of different views of the same data. It is useful to keep a copy of the raw data in a view called, for example, “Raw Data” view. The overview timeline and the location table above are from the Raw Data view.

We can now create a separate view with a filter which is specified to ONLY INCLUDE data which contains the custom dimension GHOST with a value of “false”. Any data which doesn’t have that added custom dimension can be guaranteed to have not been derived from a genuine visit to the website. In other words, it is ghost traffic.

It is now possible to compare analytics which include ghost traffic with those that specifically exclude it. We’ll now look at the same analytics without the ghost traffic.

GA-Ghost-filter
Setting up a filter to only include traffic with a value of “false” in the Ghost custom dimension.

A true view of real traffic

The above image is the filtered view of the same data as shown in the “raw data” view. You’ll immediate notice some major differences.

Real traffic only visited the site in local “office hours” – 9AM until 6PM. There were only 14 visitors who requested 134 pageviews. You’ll also notice in the table of location data, that real traffic was recorded only from Australia, USA, India and Philippines.

This website belongs to a small local business in suburban Brisbane. It is a recently re-built site and the traffic is slowly building up from zero. While these analytics aren’t as “impressive” as the earlier set, at least they are real and we have a true benchmark to measure against as we work on the site to drive more traffic to it.

Method 2 – use Fou Analytics instead of Google Analytics

A better Analytics platform

For our second method of obtaining real analytics, I am going to introduce you to a better analytics platform …one which has inbuilt cybersecurity and is purpose-built to display the exact analytics of real traffic to the site.

This platform is called Fou Analytics and was custom built by Dr Augustine Fou. New York based, Dr Fou has worked as an ad fraud researcher for the past 25+ years helping many of the major digital advertisers to get real analytics to be able to measure the performance of their advertising. We would all be amazed to get an idea of the degree of fraud in the digital advertising business. I continue to be amazed as I read what the “bad guys” are doing and getting away with.

Ten years ago, Dr Fou built his own set of tools to facilitate this work because of the deficiencies of all of the available platforms. While these tools are purpose built to analyse sites with very large traffic volumes, and with the primary intention of detecting fraudulent activity in the digital advertising industry, they work incredibly well for our purposes as small business owners wanting to get real traffic data to our low traffic websites.

Aligns with real traffic view

My intention here is to give you a very, very brief introduction to Fou Analytics by using the real life example we have been discussing.

We will see that the analytics measured by Fou Analytics absolutely agree with the analytics produced in Google Analytics – once the ghost traffic was filtered out.

Fou Analytics measures many parameters to calculate and categorise traffic on a scale ranging from “confirmed problematic” (displayed in RED) through to “confirmed no problems” (displayed in BLUE). In between are separate categories for “search” such as Google (displayed in YELLOW) and “declared” or named (displayed in ORANGE).

The data are displayed in a number of timelines, charts and tables …all of which can be drilled down into to obtain more detailed analysis. We’ll focus on just three of those: the timeline of the whole day; the breakdown of traffic into the main categories and the chart showing unique “fingerprints” of every individual site visitor.

This first image is the equivalent to the Google Analytics timeline and shows that traffic was only recorded between 0900 and 1800 – the same as we saw in the filtered view of Google Analytics once the ghost data was removed.

The vertical scale to the right shows a summary of the traffic for the day. 30% of the traffic was confirmed as genuine hits from reliable sources and 56% were from sources that were probably not problematic at all. Only 4% looked at all suspicious and no traffic was confirmed as problematic. There was one hit that was a named bot and a very small number of hits from search engines.

This chart shows more detail of the categories from “confirmed problematic” through to “confirmed no problems”.

By drilling down from this chart and the table of fingerprints shown in the column to the right, I was able to identify that there were two visits to the site from different Google servers in the USA; one from the Amazon server in the USA; and the 5 “suspicious” were from identifiable servers in India and Philipines.

This table gives details of every visitor and the number of page views attributed to each. Each visitor is identified by a unique “fingerprint” which allows further drilling down into the details of that visit. Information such as location, ISP, IP Address, computer platform, screen size, etc, etc is available for examination.

Notice also that the data of 135 pageviews by 16 unique individuals is very close to the same data recorded by the filtered view in Google Analytics.

Summary

I trust that I’ve shown that even small business owners need to be aware that all traffic shown in Google Analytics is not to be trusted at face value. If your business depends on having traffic to your website for any purpose such as selling product, generating leads, or showing advertisements, you absolutely must interrogate the analytics to uncover the “real” traffic.

The “raw” analytics data which is presented in Google Analytics is also called “vanity metrics” as those who take it at face value usually do so because they are impressed by the data and really want to believe that it is actually true.

Particularly if you are paying someone to drive traffic to your site through SEO or pay-per-click advertising, you should insist on a detailed breakdown of the analytics data presented to justify the work done. It may not look as impressive, but you must be confident that you have a measure of real traffic rather than fake traffic.

In conclusion, I’d suggest that you also become familiar with the work of Dr Augustine Fou who generously shares his insights in his LinkedIn profile. As you read his articles, you’ll find it eye-opening to discover what’s actually happening in the area of digital fraud.

As always, it you need help with anything discussed in this article don’t hesitate to contact me directly.

Similar Posts

Leave a Reply

Your email address will not be published.