Thu, 21 Aug 2003

Bad Robot

To: robot@mirago.co.uk
Date: Fri, 15 Aug 2003 04:09:55 +0100


I've noticed that your robot has recently crawled my site and for every
page it's visited it's claimed to have been referred there from a page
on your site. This is incorrect behaviour. If you read the relevant
RFC[0] you'll see that section 14.36 states:

14.36 Referer

   The Referer[sic] request-header field allows the client to specify,
   for the server's benefit, the address (URI) of the resource from
   which the Request-URI was obtained (the "referrer", although the
   header field is misspelled.) The Referer request-header allows a
   server to generate lists of back-links to resources for interest,
   logging, optimized caching, etc. It also allows obsolete or mistyped
   links to be traced for maintenance. The Referer field MUST NOT be
   sent if the Request-URI was obtained from a source that does not have
   its own URI, such as input from the user keyboard.

As there is no link to any page on my site from the URI in question (
http://www.miragorobot.com/scripts/mrinfo.asp ) it would seem to me
that your robot is not following the RFC.

If you wish to provide people with information about your bot then can
I suggest that you use the User-Agent header as detailed in section
14.43 of the RFC.



[0] http://www.rfc-editor.org/rfc/rfc2616.txt

No reply as yet.

Given that pretty much every other robot that comes across this website seem to be able to do the right thing you have to wonder why they can't. Of course, assuming they respect it, I can always use the robot exclusion thing to deny them access. Doesn't make it less annoying.

