How to Exclude Data from Google Analytics

Recently I was reviewing some Google Analytics data for another site and noticed some anomalies. While we often rejoice at getting a traffic bump, it doesn’t help if the traffic is garbage. We had a lot of traffic coming from Boardman, Oregon. As soon as they hit the site, they were gone. In this case, we wanted to exclude the traffic from showing in Google Analytics.

Table of Contents

After doing further analysis, I suspected the traffic was related to one of our vendors. When I inquired, they suggested I just set up an analytics filter to remove that location. I saw several problems with this approach:

  1. Google Filters can be destructive, so they should be used carefully.
  2. There was probably good traffic coming from Boardman.
  3. The problem traffic didn’t just arrive this week…it had been festering for a bit

My preference was to create a custom segment and exclude it from reporting.

Finding Outliers and Patterns

Usually, when strange stuff shows in Google Analytics, I like to go through the various reports and see what stands out. Sometimes my alerts will spot these or even Google’s Insights.  In traversing through the reports, I noticed that Oregon had a traffic spike. This clearly showed on the Audience > Geo > Location report.

When I clicked on Oregon, I could see Boardman consumed most of the state traffic. And it also had a bounce rate of 100%.

Location report showing traffic from Boardman Oregon.
Traffic spike from Oregon

Use Dimensions to Find More Clues

At this point, I’m not throwing out the whole city as there could be good traffic from there. I thought there was something else I hadn’t seen. What was special about this Boardman traffic?

I decided to go through other dimensions to see if anything else surfaced. One item that stood out was the Network Domain – amazonaws.com. Guess where one of Amazon’s data centers is located? While Amazon has many data center locations, one is Boardman.

Netword domain dimension showing traffic is from Amazonaws

Apart from Amazon having a data center there, I still didn’t see the connection. The other oddity was they all had the same non-standard screen resolution of 800×600.

Don’t Jump to Conclusions

In this early stage, it’s easy to conclude there is some rogue bot or process that is messing with you. Continuing to try different dimensions, I used Source / Medium and saw that I knew the enemy, so to speak.

The Source / Medium column revealed that our Boardman traffic was somehow related to our email service provider (ESP) and CRM. I wish I could say more, but the service provider is still researching this issue.

Once I honed in on the email service provider, I could go back and see that this problem had been around for a while. The Boardman traffic jumps corresponded to days where we sent emails.

Include and Exclude Segments

At this point, we could’ve built a Google Analytics filter to exclude this traffic in the future. If you’re not familiar with Google filters, you can check out this nice article from Loves Data. However, we needed to clean up the past data, and filters aren’t retroactive. If we didn’t do this, our comparison data would be off.

I decided one approach would be to create 2 segments. The first would include the Boardman traffic based on the conditions I observed. This segment quantified how much fluff traffic was coming from Boardman.

However, I wanted to build a complement segment that excluded this same traffic. This new analytics segment would primarily be used in reporting. If done correctly, the two segments would add to 100%.

Custom segment excluding Boardman amazonaws traffic.
Building a custom segment

Until we can get a resolution and/or directions from our vendor, we use the Non-Boardman segment instead of All Users.

Lessons Learned

I learned several lessons from this issue. The first is that there is something to that adage, “location, location, location.” And aggregate numbers can hide data. Just remember that while you may not know the cause of this traffic, you can always build a segment to exclude it.