GoDaddy & Phantom Robots.txt Files
Domain Hosting June 11th, 2008Today I was doing my due diligence on a client’s site and noticed that their robots.txt file was much different than what I remembered it as being. I thought this was mildly odd but just racked it up to my own oversight. I uploaded the new robots.txt file and the appropriate changes were made, no problem. I then figured I’d just check this site’s robots.txt file to make sure everything is as it should be and noticed something peculiar. The robots.txt that was being displayed was not the robots.txt file that was in my root. What was going on here? For a second I half thought that friends had managed to get into my blog and mess with me a little but soon found out it was something more. I logged into my FTP client and deleted the robots.txt file. I then returned to my browser, cleared the cache and refreshed. Still, the file was there. It wasn’t MY robots.txt file and according to my FTP client, it was no longer present in my root. So…um…what? Google Webmaster Tools is reporting the robots.txt file as I had it written. Google Webmaster Tools is not enabled through GoDaddy, and there are no WordPress plugins that could be causing this phantom file to exist. I had several people on other IPs check it and with different user agents all returning the same phantom robots.txt file. Then, this article about GoDaddy uploading their own robots.txt files to customer sites was brought to my attention. I called GoDaddy and unfortunately some sort of clueless grandpa answered the phone. He was really nice and patient but he didn’t know what was going on. Didn’t even know what a robots.txt file was. I was on the phone with them for about an hour until I got through to their Advanced Hosting department. The new rep recognized the issue right away and opened a ticket for me and this is where it was left. Hopefully they’ll resolve this problem. To say that this is a bit disconcerting is a gross understatement. If GoDaddy is intentionally messing with customer’s accounts I’d rack this up to business fraud. Needless to say I’m searching for a new host so if you have any recommendations by all means let us know. Also, if you have GoDaddy horror stories or stories similar to this we’d like to hear those as well.
Update: GoDaddy seems to have resolved the problem and the robots.txt file is as it should be. Seems as though the issue has been resolved for now but needless to say, this makes me rather wary of GoDaddy and their hosting services.
Second Update: Just received an email from GoDaddy’s tech support:
Dear Sir/Madam,
Thank you for contacting Hosting Support.
We have reviewed the issue with a non existent robots.txt file being displayed on your squareoak.com site. While reviewing this issue we were able to determine that the robots.txt file is being displayed due to the .htaccess file in the root of your hosting account. There appears to be a mod_rewrite rule that is accessing the wordpress install in the /blog directory. WordPress is dynamically generating the file displayed. If you do not want this to happen you will need to modify the .htaccess file.
Please contact us if you have any further issues.
Aaron P
Hosting Support
Hosting Operations
Yeah thanks guys. That was about as helpful as the hour long phone call today with your Gomer Pile-esque tech guy. The issue had nothing to do with my .htaccess file. My .htaccess file has been the same for months and has never had any bearing on the robots.txt file. The file was deleted and was still visible via multiple browsers, user agents, and IPs. This issue has recently been resolved and I’m guessing you guys did SOMETHING and I thank you for that. I just hope it doesn’t happen to anyone else.






June 11th, 2008 at 5:53 pm
you know, there is this thing called caching that happens on the servers from time to time.
It’s entirely possible that the robots.txt file got cached in an intermediary pipe.
June 11th, 2008 at 6:12 pm
NearlyFreeSpeech.net is *much* better than the idiots at GoDaddy.
June 11th, 2008 at 6:16 pm
I use eleven2.com for my web host, and I love it!
June 11th, 2008 at 6:18 pm
@REWIND I don’t know if this is the case, as of 3PM EST I deleted the robots.txt file from the root. As you can see here: http://www.squareoak.com/robots.txt/ it still exists. Also, the robots.txt file that is present is not the one that I had in there so it was overwritten and not by me. No one else has access to this site except GoDaddy.
June 11th, 2008 at 7:22 pm
For those who have asked, the previous and original robots.txt file was:
User-agent: *
Disallow: /blog/wp-login
June 11th, 2008 at 8:27 pm
It;’s kind of funny that none of you noticed the link is /robots.txt/, not /robots.txt, however viewing both redirects to /robots.txt/ and displays the * Disallow. Here’s a hint: it’s happening at the web server level (or web proxy level if they have one in place). This may be a system wide setting, I don’t know, but I find it funny that you “experts” are so clueless.
June 11th, 2008 at 8:37 pm
@anonymous, you’re right, that was my mistake adding the trailing slash at the end of the link in the previous comment. This was not the issue however. The issue, as confirmed by a GoDaddy rep was that there is a phantom robots.txt file.
June 11th, 2008 at 9:10 pm
Wordpress automatically “generates” a robots file if you don’t have one in your webroot.
http://wordpress.org/support/topic/125037
June 11th, 2008 at 9:24 pm
I am glad you got it resolved. Support told me I didn’t know what I was talking about. I left it at that and immediately found another host. I use HostGator. They are amazing. Great service and support and great prices.
Good luck in the future. If you want to hear some horror storeys go to nodaddy.com
June 11th, 2008 at 9:35 pm
Brendan,
I’m sorry to hear you’re rather wary of our hosting services. I wanted to assure you that we *do not* add a robot.txt file to any of our shared hosting accounts. Many applications, such as WordPress, automatically generate a robot.txt which is triggered by rewrite rules.
Regards,
Alicia R.
Go Daddy Hosting
June 11th, 2008 at 9:53 pm
I had the same problem with Network Solutions.
Interestingly, the problem only manifested after my site started generating high traffic and I started making money from advertising. It also conisided with NS’s increased marketing of their SEO products.
I didn’t buy their SEO products…and they never resolved the problem. After three months of back and forth with them, I have decided to set up my own server and host my site myself.
June 11th, 2008 at 10:42 pm
Brendan:
If you are still considering hosting options, I recommend using CrystalTech.com (CT) web hosting. I have been hosting multiple web accounts with them since 2003 and never looked back.
Having utilized Rackspace and others before, I have yet to find service and performance as I have with CT Web Hosting. Hands down the best service from them is that the main tech support number connects to the network operations center and not a typical operator support center with limited knowledge.
If you ever need to contact CT, generally sysadmins answer the phone and solve most issues directly. I cannot say enough positive things about this company…they even offer FTPS over SSL, the next best FTP connection protocol to using a Security Token Key Fob.
Best of luck and thanks for the GoDaddy robots.txt information.
Cheers,
Isabella
June 11th, 2008 at 11:03 pm
@mchau, @Alicia R. I assure you, this wasn’t WordPress related.
June 12th, 2008 at 1:29 am
mchau provided a link that explains that this is a WordPress feature that automatically creates a robots.txt when one doesn’t exist and the alleged robots.txt that Go Daddy is forcing on you explicitly refers to blog/wp-login.php. But you’re positive that this isn’t WordPress related? How is that?
June 12th, 2008 at 6:00 am
Eleven2.com ? Never heard of them.
Apparently they’re up to date. MySQL 4.1 advertised.
Apparently they’ve got the latest qood quality infrastructure. That includes, and I quote from two pages, “Serial attack storage”.
Oh, and they link to their .net domain which has a 404. And their .org is “coming soon”.
Very cool
.
FWIW we use GoDaddy for their domains, not their hosting. Their web interface does suck badly, however.
June 12th, 2008 at 7:02 am
I’m pretty sure you’re wrong in this instance. While there are plenty of good reasons not to host with GoDaddy, this unfortunately isn’t one of them.
Take a look at the wordpress link that was posted earlier. If wordpress does not find a robots.txt file in your root, it *dynamically* generates one.. on the fly…
Now, what caused your original custom robots.txt file to be changed? I could only guess.
June 12th, 2008 at 7:08 am
@Alicia
Brendan is correct this is not something that any application created. If it were it would have shown up in the FTP client. This is nothing more than GD trying to reduce load on their servers. You can deny it all you want but this is a deliberate tactic just like your insane $200 domain reactivation fees.
June 12th, 2008 at 9:14 am
As I write this, if I request /robots.txt then I see “Disallow: /blog/wp-login”. If I request /robots.txt/ I get an empty “Disallow: ” rule. And if I ask for /random.txt then I see a sexy 404 page with “Nothing Found for Random Txt” as the HTML title. Clearly, you have some code somewhere mucking with the URL structure. I can’t point you to exactly where the code is, but it might be worth digging through how WordPress handles 404-errors, not just robots.txt.
June 12th, 2008 at 10:22 am
@All Those Who Think They’ve Figured This Out. Plainly speaking…the robots.txt file was deleted completely and was not able to be seen in the root via a 3rd party FTP client AND GoDaddy’s web based FTP client. Yet the robots.txt file was still there/visible in the multiple browsers, user agents, and IPs. This has nothing to do with WordPress, .htaccess files, headers, or trailing slashes.
June 12th, 2008 at 10:35 am
I loathe GoDaddy because of what they did to Fyodor (among other things), but in this case, the culprit could well be WordPress. To test this, I set up a local install of wordpress. I confirmed that there was no robots.txt, then I went to http://localhost/robots.txt, and got “User-agent: *\nDisallow:”. How does it get there? Check .htaccess. It redirects all URLs to WordPress’s index.php. That goes through a bunch of contortions to finally take you to wp-includes/rewrite.php, which contains a special case for robots.txt. No robots.txt file is *generated* — it’s merely served. If you go to http://localhost/2008/02/27/a-picture-of-my-cat, and you get a response, you don’t assume that your server has a directory 2008 with subdirectory 02 with subdirectory 27, etc. Why should you assume the same for some other file?
June 12th, 2008 at 10:47 am
Assuming that a robots.txt file which you upload overrides the default, I don’t see this as much of an issue.
They’re doing this so that their error log files don’t fill up with 404s as quickly, and the robots.txt that they serve is neutral in nature.
Probably if this is their goal, a better way would be to include a default neutral robots.txt in newly created user’s directories.
June 12th, 2008 at 10:59 am
Brendan,
The file doesnt exist on your server’s machine, so you won’t see it in FTP or the web interface.
It was being generated dynamically (for each request) by WordPress due to a mod_rewrite rule, as mchau pointed out:
http://wordpress.org/support/topic/125037
Thanks,
Paul
June 12th, 2008 at 11:11 am
@Paul Barrass, This is interesting and is the closest evidence I’ve seen to putting the blame on WordPress for the dynamic generation of the robots.txt file. However, in your link Otto42 states:
“You’re not finding it because it’s not there. WordPress generates a robots.txt response to web requests for it only if you don’t have the file there.”
From the beginning, the original file was in the root but the browser was displaying the phantom one. So either WordPress generates this regardless of a preexisting robots.txt file or something else (possibly server-side) was happening.
June 12th, 2008 at 2:16 pm
Are you certain this has nothing to do with WordPress? I just checked a new client’s site — they’re hosted by GoDaddy, aren’t running any CMS, and I know that I haven’t published a robots.txt for them. I got a 404 when I tried to get to domain.com/robots.txt, so clearly GoDaddy hasn’t thrown one up there themselves.
June 12th, 2008 at 3:26 pm
Hi Bob. Yeah I’m not entirely sure what is going on here. There seems to be a chance that WordPress could be dynamically serving up a robots.txt but there seems to be more evidence to suggest that GoDaddy had something to do with it. So I’m not too sure. I’ve never had this problem before and my .htaccess file and my wordpress in general has stayed the same. So out of no where I get this random anomaly where the robots.txt file that I see in my root and what I think to be robots.txt file (A) is being displayed in my browser as robots.txt file (B). So I delete the robots.txt entirely and yet file (B) is still able to be seen in my browser. It doesn’t seem to matter if I upload a new/blank robots.txt file or not. File (B) is still visible. So I call GoDaddy open a support ticket and wait. About 5 hours later file (B) is no longer visible in my browser. I then upload original file (A) and everything is back to normal.
June 13th, 2008 at 12:58 am
Let’s pretend like this wasn’t a WordPress issue, which it seems clearly to be. Why would Go Daddy put up a robots.txt that disallows bot access to wp-login.php? Your stated theory of saving bandwidth just doesn’t ring true in this instance because wp-login.php just isn’t a high-bandwidth page (moreover, if you go over on bandwidth you get a charge so it seems like they’d encourage usage). Presumably not every site hosted there has a WordPress blog, so it seems far-fetched to think that this was something that they did to every account.
You casually accuse them of fraud, but to what end? That’s a serious question. You seem to be jumping to malfeasance without warrant: there’s plenty of alternate explanations and culprits that fit the scenario.
June 13th, 2008 at 5:17 am
@Brendan, There appears to be a rewrite rule that is trying to remove trailing slashes in a way that doesn’t work. This is a minor misconfiguration and has very little impact, aside from being confusing. The rule would look something like “RewriteRule (.*)/$ $1 [R]” and should actually be “RewriteRule (.*)/$ $1 [R,L]“. It may be in your .htaccess file, or it could be part of GoDaddy’s configuration.
What’s happening is that the [R] flag is rewriting a URL like “robots.txt/” into an absolute URL “http://www.squareoak.com/robots.txt”. It also changes the requested filename from something like “/path/to/robots.txt” into “http://www.squareoak.com/path/to/robots.txt”. It does this to signal to Apache to send an external redirect, instead of just silently rewriting the URLs. The problem is, without the [L] flag, we continue evaluating other rewrite rules after we’ve done this conversion. If those rules rely on the requested file being right, you’re going to get weird behavior.
One of the rewrite rules we evaluate after doing this trailing slash removal bit is WordPress’s robots.txt rule. It basically says, “If the requested file does not exist and the requested URL is for robots.txt, act like the user requested index.php?robots=1″. Well, since the [R] has changed the requested filename to be invalid, but the URL is still for robots.txt, this rule gets applied.
How does this explain what you’re seeing? I think you may have initially gone to http://www.squareoak.com/robots.txt/ (with the trailing slash). The incorrect rewrite rule basically caused you to see WordPress’s dynamically-generated robots.txt. You noticed this wasn’t your robots.txt file. You went in and deleted your robots.txt file in an attempt to verify that there was a “phantom” robots.txt file being served by GoDaddy. But when you did this, going to http://www.squareoak.com/robots.txt (with or without the trailing slash) caused WordPress to serve its index.php?robots=1. So you think you’ve confirmed that there’s a phantom robots.txt. You come back later, you put your robots.txt file back, you check http://www.squareoak.com/robots.txt (without the trailing slash), see it’s working now, and think, “Ah, GoDaddy fixed the problem.”
The bright side is, GoogleBot (and every other bot) is requesting “robots.txt” without a trailing slash, so the right file is being served.
June 13th, 2008 at 11:22 am
@Bill, The robots.txt file that you see now was not the file that GoDaddy had in there before. The file that’s there now is the file that’s SUPPOSED to be there. Previously there was a robots.txt that wasn’t mine, I hadn’t uploaded it and I had no idea where it came from. I was then sent an article by a friend that mentioned GoDaddy uploading robots.txt files to customer accounts. Whether this is intentional or a mistake I just don’t know. The theory you talk about isn’t mine but that of the folks from Geek Daily, the article I reference in this post. I’ve stated the evidence over and over again and find myself replying to comments like yours, where it’s quite obvious that the article and previous comments haven’t been thoroughly read through. Read what I’ve posted above in the comments completely and please do tell us what you think afterwards.
June 13th, 2008 at 11:26 am
@Dak, Good info here. Thank you for taking the time to write this but please see here: http://www.squareoak.com/blog/godaddy-phantom-robotstxt-files/#comment-2821
June 19th, 2008 at 4:45 pm
Hrm. If the tiny little “Update” paragraph was there when I commented before, I missed it. In any event, since nothing was ever said about what the content of the phantom robots.txt file was, it wasn’t obvious that you were talking about some other content than what I was seeing when I commented. Otherwise it would have been a lot more obvious that something had changed, and I wouldn’t have wasted our time.
June 19th, 2008 at 10:00 pm
@sapphirecat Yeah, just added the ‘email updated comments’ plugin
. In any event, the content of the phantom robots.txt file was mentioned before but I’ll mention it again:
Godaddy’s version:
User-agent: *
Disallow:
My Original version:
User-agent: *
Disallow: /blog/wp-login
June 20th, 2008 at 3:39 pm
I’m Otto, a moderator on the WordPress.org support forums.
The latest versions of WordPress DO generate robots.txt files via rewrites, if there is no robots.txt file already present in the main WordPress directory.
If you go to the Settings->Privacy menu option in WordPress, then you’ll see an option there that changes WordPress to block or allow search engines to index the site. If you switch this, then you’ll notice the “pretend” robots.txt file changes.
So yes, WordPress is absolutely generating this fake file. GoDaddy hosting is NOT at fault here.
June 20th, 2008 at 10:44 pm
@Otto Yes I’m well aware that WordPress does this. I don’t think this is what was happening and I haven’t seen any further changes in my robots.txt file. When you’ve had a site for a given amount of time and all settings/plugins stay the same and then one day something like the robots.txt file mysteriously changes, one has to wonder.
August 13th, 2008 at 5:44 pm
Just checking to see if you have made any head-way for your robots.txt issue yet. I am going through the same issue and have a godaddy host with a wordpress blog. I have been back and forth to all of them and haven’t been able to fix the problem. I have follow up with every bit of advice/information from everyone on this thread but still have not made any progress.
September 29th, 2008 at 5:18 am
I went through this thread and it was a life saver. I was trying to figure out how someone could see my robots.txt file but I couldn’t find it anywhere in my wordpress installation…I host with godaddy as well. Something fishy was definitely going on….but all I did was upload a new robots.txt file to my root installation for wordpress and made it so it will allow everything. I went back to Google webmaster tools and I could upload my sitemap with no problem. Thanks guys.