The Story Behind This Case Study
A client recently saw a spike in direct traffic that didn’t quite stack up:
This seemed odd as there hadn’t been any sort of marketing, online or offline, that would have led to this uplift – looking into this further, I could see that this traffic had extremely low engagement:
When the bounce rate is so high and duration so low, you can be pretty sure this is Google Analytics spam via a spambot of some sort. Whilst Google is generally good at blocking spambot traffic, in the never-ending war against spam some will inevitably slip through the net.
So, the next step is to determine where this is coming from and filter this from your view so as not to skew your data moving forward – unfortunately, it won’t remove the data retrospectively, but you can use a segment if you want a view of historical data with spam removed.
How Do You Identify Analytics Spam?
By reviewing various dimensions, it should be possible to identify a common denominator for GA spam – most often you will find it comes from a hostname other than your domain, or a language type that you wouldn’t normally expect to see.
In this case, we identified the source as a city called Ashburn in the USA.
Ashburn, as it happens, is home to a number of big tech companies in the US – one of them being an Amazon Datacenter – which means in turn it will host a number of spambots.
As the client is a tech company based in the US, we had to be sure this traffic wasn’t genuine, but as you can see from the screengrab above, the engagement is low enough for us to decide that traffic from this city should be blocked.
To do this, you set up a Google Analytics Filter which blocks any traffic from a city as follows:
Further to this, another UK based client saw a spike in traffic from Chicago which again had low engagement – so again a filter was set up as above. However, as it would seem that city-based spam is on the rise, you don’t really want to have to set up a segment for each city that is causing issues.
How Do You Set Up Filters For Multiple Cities?
To do this, you simply list all the offending locations separated by a pipe symbol which effectively says “or”, for example:
What If There Is Genuine Traffic From Ashburn or Chicago?
If your site is not US based and generally doesn’t serve US visitors, chances are the above filter will resolve your issues. However, if you get genuine traffic from these locations, you don’t want to do a city-based filter and block real life visitors.
In this case, you should go the relevant section in Google Analytics – in this case Audience -> Geo -> Location, click on to the city sending the spam, then add in a secondary dimension of “Network Domain” – this basically is a look up of the user’s IP to determine their ISP.
From this, you should be able to identify the source of your spam:
In the above example, we identified the top two networks – “unknown.unknown” and “relatively.com” as the culprits sending high volumes of visits with session durations averaging at a second or less.
In this case, you can create a filter that blocks by “ISP Domain” – as these contain full stops, you should place a backslash before each so the filter is as follows:
Of course, you may now run the risk of blocking genuine traffic from other sources, so be sure to check if any other traffic is coming in off any networks you wish to block.
If you do find that the spam is coming from a genuine network, look for any other potentially spammy dimensions such as language or hostname. Generally, you tend to find something with spam that you can filter and block.
Ideally you would want to create a filter that excludes multiple dimensions – e.g. City + ISP, however, that’s not possible in analytics, which is a bit of a shame really. And you can be certain that the spammers know this and hence make it as difficult as they can for you to block them.
And so the war against spam goes on. Hopefully the above solution will help you win this particular battle.