Dave Beckett - Journalblog

Hacking the semantic linked data web

  • Recent posts

  • Follow me on twitter

Month: May, 2006

XTech 2006: Open Street Map (OSM), Steve Coast

Presentation at XTech 2006 in Amsterdam

The Open Street Map/ (OSM) approach is to have both Free Data and Free Software and to add to the data in the wiki style.

There are various problems with using UK Ordnance Survey (OS) data including IPR (even with academic licenses), leaving little choice but to use out of copyright maps, the last open ones are from 1944, pre-WWII and before motorways were constructed.

Steve showed an animation of tracks recorded by couriers in London, used to build up vectors of the roads in london. The tracks became rather thick when aggregated due to inaccuracy of GPS but this sodes build up lots of data.

The OSM site provides RESTian APis to the data – both read and write of data in XML form, along with monthly dumps planet.osm. He shows examples of the data in the UK that has been collected over the last 2 years, although not all of the data is yet shown in the OSM frontpage.

There are several third-party applications that are making use of OSM data including GpsDrive free software route tracker, the FlightGear flight simulator (who built a free flight simulator but forgot about terrain) and it can even be used on some GPS units as source of data for routing, the example given was a Garmin device. The XML data itself has been increasingly turned into prettier output such as using XSLT into SVG, giving much more accessible maps.

The current OSM focus is the UK but there is data collected elsewhere in Europe. The UK status is that all UK motorways are now mapped, several major cities and even a forest – the New Forest, so it is not just for the drivers in the world, but for walkers over paths and public rights of ways, bridleways. (I thought maybe the UK Rambler’s Association should be brought in here).

In comparison to the US, it has a glut of data due to the US copyright law that the governement cannot own copyright on map data they create – the Tiger/line data. This can be loaded into OSM and annotated just like for any other country.

Recently, the OSM project ran a mapping for the UK Isle of Wight island in the south of England. The last usable map that could be used as a basis was the copyright-free map from 1944 which gave a rought outline of the land and areas but is still rather crude. 30-40 volunteers walked, drove and cycled the island and covered 90%+ of the roads in two days taking pictures/recording the roadnames for traces and turning it into annoated traces and then vectors, loaded into the OSM itself. Not all of this is yet online. Two weeks ago there was another mapping even in Manchester – a built up area which causes technical problems – but feedback from the first event caused more volunteers to turn up than planned who needed training as they expected to be given GPS units when they arrived. There may also be a need for more expensive gps units to handle city mapping.

As an aside, Steve pointed to the Isle of Man which is a blank island for roads in Google Maps as it is a separate country with different licensing and copyright laws, so has no licensed road maps. OSM does have maps for it since there are contributors who live there.

There has been fallout from the two events after publicity in an article in the UK Guardian Newspaper and on blogs such as Boing Boing. The Guardian article ementioned a trap street in Bristol: Lye Close which was made up to see when maps were stolen. Or in Steve’s words: “they give you crappy data and make you pay for it”. Steve suggested that cartographers are the very purist – to make the perfect map and prevent you scribbling on it. OSM doesn’t have to do this as it has no profit/business model

An OS spokesman was quoted on their work and explained that maps are expensive to update at the level of detail they work at, however Steve said a lot of people don’t want that data, just something at a rough scale, a personal scale, not down to the millimeter.

Finally he mentioned UK Postcodes (aka Zip Codes in the US) which are completely Copyrighted by the UK Post Office and you can’t touch them. Annoying. So OSm have been collecting GPS points + post codes one-by-one and although this isn’t going to be complete as there are 19M+ UK postcodes, it will give the 100m accuracy level most people carea about: the first part + the number of the second part of the post code.

Technorati tag:

Namedropping XTech 2006 Amsterdam

I’ve arrived in Amsterdam and already I’ve bumped into a lot of lovely people – Tom Coates, Paul Hammond, Simon Willison (who were all on the train from the airport, and all Yahoo!s: proof), Matt Patterson, Edd Dumbill, Mark Nottingham, Uche Ogbuji, Bijan Parsia. While I typed this, Libby Miller and Damian “I’m in the Guardian” Steer just walked past to register.

and now back to writing the slides for my talk later this week: Semantics Through the Tag on Friday.

Ooh, we are suddenly distracted by the new Apple x86 Mac Books being launched.

Technorati tag: for Planet XTech

RSS to iCalendar

I made a quick hack and after showing it to a few people at the Jena User Conference 2006 in Bristol, it seems it might be worth showing to people. I had this flash of inspiration^wthe obvious: RSS is a format that is a syndication feed of items with (date, time, description) which sounds remarkably similar to a calendar file such as iCalendar ICS files produced and consumed by calendar programs such as Apple’s iCal. So I made a program that uses Raptor’s rss-tag-soup parser to read any RSS/Atom feed into an RDF graph (*), query it with SPARQL to grab the entries and then output it in the iCalendar format. Most of my time was spent dealing with the baroque \-escaping of the result format, and trying to get the RSS dates to work; even with the help raptor gives using curl’s curl_parsedate to uplift from RSS date junk.

So the result is: rss2ical.c and here’s PlanetRDF‘s feed as a calendar: http://planetrdf.com/index.ics which works in at least the two programs I tested: Apple iCal and Google Calendar.

(Of course this feed is proper RSS 1.0/RDF so the rss-tag-soup parser is not necessary; I could use raptor’s guess parser to do the right thing.)

I tried it on some other feeds in the wild too and it mostly worked but it looks like there are some other bugs to shake out. I’m interested to what people think of this as an alternative interface to the River of News format. It allows you to skim a bit easier in your calendar and to see when news items arrived. I’m not sure if I’ll run it as a PlanetRDF service yet.

(*) This means with a slight modification, it could do an aggregated calendar if I loaded in multiple feeds.

And one more thing … Rasqal RDF Query Library 0.9.12 Released

As tempting as it was to call this posting Not Blogging the Company, this isn’t about something I did for my employer, so instead I’ve borrowed Steve Jobs’ famous phrase. On Sunday 30th April I announced the release of Rasqal 0.9.12 which has a lot of internal changes but externally it’s mainly for the following:

This can as usual be tried out in the Rasqal RDF query demo

What is new beyond the original announcement is that I’ve got the Redhat Fedora Core 5 RPMs built and the Debian debs available from the Redland download site thanks to my recently restored working Xen 3.0 virtual machine server (5 Linux VMs – Fedora Core 5; Ubuntu Hoary, Breezy, Dapper; Gentoo).

Blogging the Company

It’s funny but I’ve never really got into blogging about my employer – Yahoo!. This is partially because there are things I can’t say (let’s call that Reason #1) but mostly because I’m mostly working on things that aren’t reaching the public. The company has a policy about blogging too and that’s also a consideration, so easier to just not say anything. Plus as my blog gets syndicated on Planet RDF and Planet XMLHack, I feel a little like I should keep on track. So although saying “Yahoo! thinks XML and RDF are really great” would cover that, it wouldn’t be true since I don’t speak for the company – see Reason #1. This doesn’t seem to put off other Yahoo! employees that blog, and there are more of those than people realise.

Recently the Y! Cool Thing and Planet Yahoo! unofficial sites were launched by some of the engineers, since they know that there’s lots of stuff Yahoo provides that doesn’t get highlighted much, but there are actually lots more official and unofficial blogs tracking what we do/speculating wildly on what we do. Many of them are listed in the Y! Cool Thing sidebar which has right now 24 official blogs, pretty amazing, but not quite at the level of places like Microsoft’s Channel 9 / MSDN or maybe Sun’s blogs.sun.com. It’s a matter of perception maybe, do people think of the Yahoo! company as a big, closed enterprise or are parts now more visible? Blogging the company from within is happening more and more.

So what brought on this introspection Dave? I’m glad you asked. Today we launched Yahoo! Tech (I didn’t work on it myself) which is a brand new site for technology products to help you choose and use them, and has lots of of Web 2.0 (TM)(R)© features. It’s the first big site in 5 years that the Yahoo! Media Group has launched, so one of the first to appear in the pervasive blog era. The site itself of course has expert bloggers called Tech Advisors, product reviews from users as well as from “professionals” and a lot more user generated content such as ratings, questions and answers etc. It’s a rich site. The launch was timed as far as I know for midnight US east coast time 1st May, however it appeared in networked syndication systems (Technorati, Bloglines, …) well before it arrived on paper and over the course of the day, has also been mentioned in the personal blogs from the engineers working on it, as they now pretty much have equal access to the new publishing world. In their own words:

Yahoo! Tech is live as of today.
Alex Moskalyuk

The site might be a result of an explosion at the web 2.0 factory, but it also totally accessable and usable on screen readers and cell phones. Heck, I even fired up lynx and the site works fine, thanks to Ted.
Jeff Boulter

I pictured Tech as a mix between Cnet and Netflix. A place where you could get the information you needed, but also feel like it was personlized just for you. I think we’ve done just that.
Ted Drake

I’m working with a great group of people, and it’s been a real thrill to be the technical lead on a project this big.
Glen Campbell

There is more at the del.icio.us tag ytech where I’ve been recording some of the amusing mixture of wire stores, articles and blogs. This mashup of blogging a company also appears in the tech.memeorandum story (read soon before it disappears) and in blog searches such as Technorati, Google News or Yahoo News Blog Search (ours!)

Or you could always go read about it on paper…

At this point I’d insert a witty conclusion, but see Reason #1 :)