If you are a completely new user of Google Analytics then this article will prepare you to remove spam from your Google Analytics reports.
If you are already using Google Analytics then you may not know it yet but it is very possible that your Google Analytics reports are inaccurate due to a thing called Referral Spam.
Two Common Types of Referral Spam
1. Spammy Web Crawlers are just like normal web crawlers that visit your site to index your content. However, these crawlers do not identify themselves as bots like they should and for this reason they are mistakenly included in your Google Analytic reports.
Semalt is the name of a popular spammy web crawler that spends absolutely no time on your website, it has 100% bounce rate (exits immediately) and only goes to the one page.
Even though it is not harmful to your actual website, it really does harm the accuracy of your Google Analytic reports.
2. Ghost Referral Spam is traffic that never actually hits your website. This is caused by spammers using an exploit to create fake visits (referral and direct traffic) in your Google Analytics reports.
The really annoying thing about ghost referral spam is that it can even fake referrals from organic search results as shown in the image below.
Unfortunately you cannot block ghost referral spam by modifying your .htaccess file as these spammers never actually visit your site.
How to Remove Spammy Web Crawlers from Future Google Analytics Reports
I am going to show you how to add 3 different filters in Google Analytics to filter out the most common spammy web crawlers from your reports.
1. Log into your Google Analytics account and click on Admin menu at the top.
2. Click on All Filters or Filters (they are both the same) as shown in the screenshot above.
3. Click on the red +NEW FILTER box.
4. In Filter Name type the name you would like to give to this filter, I used the name Block Spammy Crawlers.
5. Under Filter Type select Custom and a Filter Field should appear.
6. In the Filter Field select Referral.
7. In the Filter Pattern copy and paste below pattern.
8. Your filter should now look like the image below. Scroll to the bottom and click on the Blue save button to create your filter.
9. Seeing as the Filter Pattern field has a max character limit of 255, we will need to create another filter.
Repeat the process to create another filter named Block Spammy Crawlers 2 (or whatever you prefer) and copy and paste the below in Filter Pattern.
10. Now create one last Filter and copy and paste the below in Filter Pattern.
How to Remove Spammy Web Crawlers from Past Google Analytics Reports
The filters we created above will filter out spammy web crawler from future reports, however, the data that has been generated before the filters were applied will not be affected.
To filter out the spam from this data we need to create a Segment.
1. If you have just completed the above instructions you should still be under the Admin menu of Google Analytics.
Go to the top of the page and click on Home.
2. Under your Website, click on All Web Site Data.
3. Click on +Add Segment.
4. Click on the red +NEW SEGMENT button.
5. Click on Conditions.
6. Click on Include ▼ and change it to Exclude.
7. Click on Ad Content and change it to Source.
8. Click on contains and change it to match regex.
9. In the textbox copy and paste the below filter.
10. Click on OR and add the below filter.
11. Click on OR once more and add the below filter.
12. Name the Segment (I named mine Filtered – No Spam) and click on the Blue Save button.
13. You will now have 2 graphs showing. One is All Sessions and the other is the Filter you just created.
15. You should now be looking at just the Filtered data.
If you would like to add another Segment side-by-side simply click on +Add Segment.
If you would like to only display one Segment and view your All Sessions Segment again then click on the Segment you just created, select All Sessions and press the blue Apply button.
How to Remove Ghost Referral Spam Google Analytics Reports
Unfortunately as of yet, there is no way to completely remove or block Ghost Referral spam but there is a way to filter most of them out and by creating a filter based on Hostnames.
The first thing we need to do is detect the Ghost Referrals that are messing up our reporting data.
1. Go to your Google Analytics account and click on your websites All Web Site Data.
2. Click on Acquisition located on left-hand side.
3. Click on All Traffic and then on Source/Medium.
4. On the right-hand side of Source/Medium click on Secondary dimension, type in Hostname and click on it.
You should now be able to see your Source/Mediums and Hostnames like in the screenshot below.
Red Source/Medium – These are typical Spam Ghost Referrals.
Orange Hostnames – These show that the spammers have not set any Hostnames.
Blue Hostname – This is an example of a smart spammer that has done a good job of faking the Hostname.
Unfortunately these spammers cannot be blocked by the filter we are about to set as the Hostname ‘seems’ legit.
Lucky for us these smart spammers are not so common.
This sneaky ghost referral is using a legitimate hostname – theguardian.com. However, if you look carefully you will notice that the source is actually theguardlan.com and not the guardian.com.
Filtering by Hostname
Before I show you how to apply this filter, you need to know that if you do not identify all of your valid hostnames then you might be excluding real traffic.
For example, the hostname translate.googleusercontent.com (Google Translate), webcache.googleusercontent.com (Google cache version of your website), www.youtube.com and anything else you think is legitimate.
Setting up Hostname Filters in Google Analytics
1. Go the Admin area of your Google Analytics account.
2. Click on All Filters or Filters
3. Click on the +NEW FILTER button.
4. Enter a Filter Name (I named mine Hostnames).
5. Under Filter Type select Custom.
6. Tick Include and select Hostname in the Filter Field.
7. Type in the Filter Pattern to include only your legitimate Hostnames.
When creating a Filter Pattern you can separate your Hostnames by using | (a vertical line)
You can also use a Regex which is .* (dot and asterisk) to match all subdomains of the ones listed.
Do Not put a vertical line at the end of your filter patterns and Do Not include any spaces.
8. Press the blue Save button.
Unfortunately at this point in time there is no way to absolutely block these annoying spammers who ruin our Google Analytics data but at least you now know how to remove and filter the majority of these Spammy Web Crawlers and Ghost Referrers.
Hopefully in the near future Google will release a way to remove this annoying spam once and for all.
If you would like more in-depth information about removing Spam from your Google Analytics Reports, I recommend that you take a look at the article Remove Spam from Google Analytics Reports Guide.