Sometimes, it’s the small items you don’t know about that bite you. After a poor support experience with my first web hosting company, I decided to move my site. I talked to the new company’s sales rep beforehand and felt confident I made the right decision. Except, there was one question I didn’t even know to ask about until various URLs showed 404 server errors. It’s what I call the “capital crime”.
I quickly learned that 404 errors meant a web server couldn’t find the resource. I didn’t realize then that server operating systems (OS) handle URLs differently. My first web server was Microsoft Windows based, and the new hosting company used Linux.
Initially, I created my web pages using a naming convention such as:
/Stories/OnceUponATime.htm
I found using mixed cases easier to read than:
/stories/onceuponatime.htm
Although I was consistent with naming pages using this URL convention, I wasn’t careful when linking to those pages within my site. As example, I might have linked to a page using all lower-case. Over time, other websites would point to these pages and some would use lower case and others would use the mixed case URLs.
Windows vs. Linux
Now, for people who say Microsoft isn’t tolerant, they are when it comes to URLs. You can pretty much do any case combination of the letters in the URL and a Windows server will display the right content. In other words, the IIS server ignored my careless behavior.
With Linux, you need to provide the URL with the correct case. A Linux server sees those two URLs above as different pages. When I moved the website to the new web host, my bad behavior was exposed as well as a number of broken links.
I may have linked OnceUponATime.htm correctly in the navigation, but someplace else on the site, I used onceuponatime.htm and that link presented a problem. Readers could not find that page and the web server gave a 404 error.
Presenting the Proper Case
I lucked out as a friend explained the server differences and created some .htaccess rules that rewrote the URLs. An .htaccess file is a special Apache server configuration file that can help you fix these types of problems. However, it’s a dangerous file if you don’t know what you’re doing. Many hosting companies don’t allow you to access it for a number of reasons.
Regardless of which web server you use, you should think about how you’re going to approach URLs. Here are some suggestions:
- Be consistent with your URL cases. Make sure you and the other members of the team abide by that convention. Some content management systems (CMS) use lowercase by default. Lowercase URLs are probably easier for people using mobile devices too. This also makes it easier for phone support people as they can tell customers something like, “the following URL is all lowercase”.
- If you don’t need file extensions for your public pages, skip them unless they reveal a required program like Adobe Reader or Microsoft Excel. This makes it easier if you want to move from one platform to another. For example, instead of having “resources.htm” or “resources.aspx”, consider “resources”.
- Avoid using spaces in URLs. While they work, the space gets translated to a “%20” which looks ugly. It’s also a pain if you have to spell out the URL to someone.
- If you like the appearance of spaces to separate words, have everyone agree on whether dashes (-) or (_) underscores will be used. The search engines don’t care which method you use, but I think dashes are easier to read. Sometimes people mistake the underscores for spaces when they see it in print.
Go play around and see examples where pages are no longer found when you change the case of a letter in the URL. For example, go to http://www.google.com/webmasters/ and change the “w” in webmasters to uppercase and see what happens.