exo : blah

content

Thu, 08 Mar 2007

traffic reports in three easy steps

The nice people at the BBC provide, among other things, travel information. As I commute via car it's handy to have this sort of information. However I can't be bothered to go and actually read a website and then pull out the information I require from it.

I can be bothered to write code that does that for me.

The first thing is the fetching of the webpage. There's many different ways to do this but for the sake of ease I always find lynx or w3m the least hassle way of getting the contents of a website:

w3m -dump_source http://www.bbc.co.uk/travelnews/local/scotland-tayside.shtml

This spews out the HTML of that page to STDOUT. I can then pipe this into something to filter it.

Originally I used the -dump argument to w3m to output formatted source and then a combination of grep and cut pulled out what I wanted. After a while though the rubbish formatting of that irked me. At this stage I did what any reasonable person does and applied some Perl to the problem. A quick look at the source of the page and you see there's a table with two columns: Location and Incident Report. Clearly I want to extract the contents of those two columns and then output only those that mention roads I travel on.

Handily the first part of this is easily solved by HTML::TableExtract which, assuming $html contains the source of my page does this sort of thing:

my $t = HTML::TableExtract->new( headers => [ "Location", "Incident Report" ] );
$t->parse( $html );
foreach my $table ( $t->table_states ) {                                        
    foreach my $row ( $table->rows ) { 
        if ($row->[0] =~ /m1/i) {
            print row->[0] . ": " . $row->[1] . "\n";
        }
    }
}

And I'll get exactly the information I want. Sadly the formatting will be rubbish. Fortunately we can just throw it at Text::Autoformat and all is well.

A bit of tidying up later and we have traffic_format. It grabs the HTML from STDIN and takes a comma separated list of roads as an argument and then prints out a neatly formatted list of incidents.

Now this is all well and good but clearly having to run w3m and pipe the results to traffic_format really isn't much of an improvement on visiting the page by hand. Ideally the information would get sent to me. I could have it emailed but then I'd have to remember to check my email. Clearly some sort of instant messaging is the solution.

In this case the sort is jabber and once again a little shim of Perl is all that's required. The shim in question is Jabber::SimpleSend which lets us do this:

send_jabber_message({
    user        =>  $user,
    password    =>  $pass,
    target      =>  $target,
    subject     =>  $subject,
    message     =>  $text,
});

Once more we add a bit of extra sugar round it and we end up with send_jabber which reads some configuration from $HOME/.send_jabber_rc and a message from STDIN and then punts off a jabber message.

Now we've got all the bits in one place all we need is a simple shell script to tie it all together:

#!/bin/sh

#config
URI='http://www.bbc.co.uk/travelnews/local/scotland-tayside.shtml'
ROADS='a92,a90,a914'
#end config

w3m -dump_source $URI | traffic_format $ROADS | send_jabber -

And then run it at the right times via the magic of cron and you have both a handy traffic report and a demonstration of the joys of unix.

And yes, the code samples in this do look bad in Internet Explorer but it's late and CSS wrestling does not appeal.

posted at: 22:03 #

all the usual copyright stuff... [ copyright struan donald 2002 - present ], plus license