Recently I was reviewing some Google Analytics data and noticed some anomalies. While we often rejoice at getting a traffic bump, it doesn’t help if the traffic is garbage. In this case, we had a lot of traffic coming from Boardman, Oregon. As soon as they hit the site, they were gone.
After doing further analysis, I suspected the traffic was related to one of our vendors. When I inquired, they suggested I just set up an analytics filter to remove that location. I saw several problems with this approach:
- Filters can be destructive so they should be used carefully.
- There was probably good traffic coming from Boardman.
- The problem traffic didn’t just arrive this week…it had been festering for a bit
My preference was to create a custom segment and exclude it from reporting.
Finding Outliers and Patterns
Usually, when strange stuff shows in Google Analytics, I like to go through the various reports and see what stands out. Sometimes my alerts will spot these or even Google’s Insights. In traversing through the reports, I noticed that Oregon had a traffic spike. This clearly showed on the Audience > Geo > Location report.
When I clicked on Oregon, I could see Boardman consumed most of the state traffic. And it also had a bounce rate of 100%.
Use Dimensions to Find More Clues
At this point, I’m not throwing out the whole city as there could be good traffic from there. I thought there was something else I hadn’t seen. What was special about this Boardman traffic?
I decided to go through other dimensions to see if anything else surfaced. One item that stood out was the Network Domain – amazonaws.com. Guess where one of Amazon’s data centers is located? While Amazon has a number of data center locations, one is Boardman.
Apart from Amazon having a data center there, I still didn’t see the connection. The other oddity was they all had the same non-standard screen resolution of 800×600.
Don’t Jump to Conclusions
In this early stage, it’s easy to conclude there is some rogue bot or process that is messing with you. Continuing to try different dimensions, I used Source / Medium and saw that I knew the enemy so to speak.
The Source / Medium column revealed that our Boardman traffic was somehow related to our email service provider and CRM. I wish I could say more, but the service provider is still researching this issue.
Once I honed in on the email service provider, I was able to go back and see that this problem had been around for a while. The Boardman traffic jumps corresponded to days where we sent emails.
Include and Exclude Segments
At this point, we could’ve built a Google Analytics filter to exclude this traffic going forward. If you’re not familiar with Google filters, you can check out this nice article from Loves Data. However, we needed to clean up the past data and filters aren’t retroactive. If we didn’t do this, our comparison data would be off.
I decided one approach would be to create 2 segments. The first would include the Boardman traffic based on the conditions I observed. This segment quantified how much fluff traffic was coming from Boardman.
However, I wanted to build a complement segment that excluded this same traffic. This segment would primarily be used in reporting. If done correctly, the two segments would add to 100%.
Until we can get a resolution and/or directions from our vendor, we just use the Non-Boardman segment instead of All Users.
There were several lessons learned from this issue. The first is that there is something to that old adage, “location, location, location”. And aggregate numbers can hide data. Just remember, that while you may not know the cause of this traffic, you can always build a segment to exclude it.