content
Thu, 21 Aug 2003
Bad Robot
To: robot@mirago.co.uk Date: Fri, 15 Aug 2003 04:09:55 +0100 Hi, I've noticed that your robot has recently crawled my site and for every page it's visited it's claimed to have been referred there from a page on your site. This is incorrect behaviour. If you read the relevant RFC[0] you'll see that section 14.36 states: 14.36 Referer The Referer[sic] request-header field allows the client to specify, for the server's benefit, the address (URI) of the resource from which the Request-URI was obtained (the "referrer", although the header field is misspelled.) The Referer request-header allows a server to generate lists of back-links to resources for interest, logging, optimized caching, etc. It also allows obsolete or mistyped links to be traced for maintenance. The Referer field MUST NOT be sent if the Request-URI was obtained from a source that does not have its own URI, such as input from the user keyboard. As there is no link to any page on my site from the URI in question ( http://www.miragorobot.com/scripts/mrinfo.asp ) it would seem to me that your robot is not following the RFC. If you wish to provide people with information about your bot then can I suggest that you use the User-Agent header as detailed in section 14.43 of the RFC. thanks Struan [0] http://www.rfc-editor.org/rfc/rfc2616.txt
No reply as yet.
Given that pretty much every other robot that comes across this website seem to be able to do the right thing you have to wonder why they can't. Of course, assuming they respect it, I can always use the robot exclusion thing to deny them access. Doesn't make it less annoying.
posted at: 06:32 #