Dave Beckett - Journalblog

Hacking the semantic linked data web

  • Recent posts

  • Follow me on twitter

Month: April, 2006

BBC Programme Catalogue Launches with RDF

Now this is cool, and I’ve seen it in demo form for a while:

BBC Programme Catalogue is live announces Matt Biddulph.

75 years of BBC programme data, all clickable in a web and ready to browse, search, tags, tag clouds, sparklines of activity. UK TV heaven. No video present but it’ll answer all those “who was in X episode of Y” questions (after a little cataloguing and upload delay for recent things).

But there’s more! RDF/FOAF descriptions of every show and contributor – an API in plain view as Matt put it. There might be some more RDF format improvements as the focus has been on the web side so far.

Well done Matt and team (Tom, Ben, Julie, Adam, BBC cataloguers, others…)

Tom Coates also gives his launch posting

European Semantic Web Conference Tour

I’ve been allowed to escape from the US after 6 months of good behaviour to go to some useful conferences in Europe. Just when the sunshine appears in California. Doh!

First to the Jena User Conference in my previous home city of Bristol, UK from May 10-11. I’m not speaking here, so I guess that makes me a listener. Of course, I’m also a little bit of a competitor so it’s useful to see what they are up to ;)

After that I head to XTech 2006 in Amsterdam, NL from May 16-19 where I will be speaking on Semantics Through the Tag which is mostly about tagging with a little bit of RDF. It’s my first public talk since joining Yahoo! although not about anything directly related to what I’m working on. Last year, this was the best technical conference I had been to for several years in both the topics, the event and the people. A web technology update without the hype.

Finally WWW2006 in Edinburgh, UK from May 23-26. The annual web conference which has never before been held in the UK and has chosen a great city as a venue. I’m looking forward to it and I’ve noticed there are many Yahoos attending this (as well as XTech).

I hope to meet lots of old and new friends in May. It should be great!

Raptor 1.4.9 with Tutorial

Yesterday I released Raptor 1.4.9 and the major visible addition this time is the first version of the new Raptor Tutorial covering all the parsing and serializing functions with working full examples. Used together with the updated Raptor Reference Manual which now covers 100% of the public API of functions, structures and defines, these provide a complete set of docs. (There is also the libraptor manual page but that’s primarily for command-line use in unix/linux).

There are of course, more fixes and improvements:

  • The rapper utility can now pretty-print RDF using namespaces from parsing as hints in serializing.
  • The Turtle parser has gained boolean literals which were accidently left out last time, oops!
  • Requests for content to parse now send appropriate HTTP Accept: headers depending on the parser used.
  • It is no longer required to use libxml2 for the rss-tag-soup (Atom, RSS*) parser
  • Various Win32 fixes and VC build files updates from John Barstow

I previously described how Raptor was refactored in detail and the results of the latest changes are mostly internal with respect to the SAX2 API. The public result is that the Atom support inside the RSS Tag Soup parser is ready but not fully enabled to handle the enveloping of X-in-Atom that people are now trying. I guess this is better than using SOAP or (ugh) XML-RSS.

Refactoring Raptor for RDF Atom

I spend most of my work day writing twiki pages or going to meetings, so I’ve been doing coding on my own time, and presently have been working on refactoring the internals of Raptor‘s XML support. I’ll explain why below.

It’s been a long process and probably never ending. The reason I started writing Raptor in October 2000 was to have a conformant RDF/XML parser and to use the best XML parser available. This wasn’t too clear then as you had the choice of:

  • libxml / libxml2: new and good
  • expat 1.95.x: old i.e. mature, well known but not having much development and also good.

So I made it work with both and as I needed namespace support for RDF/XML, made them both look like they generated something like SAX2 namespace events. At that time, only libxml2 supported namespaces. This libxml/expat + namespace support + RDF/XML parser was all done in one 140K C file. Which was a problem, but the parser did work!

Raptor slowly grew more features to support the updating of RDF/XML and I became the editor of what would be the revised RDF W3C Recommendation. It added: URIs, URI resolving, URI retrieval, XML Qnames, XML Namespaces, XML Base, Unicode, UTF-8 and an XML Writer for the rdf:parseType="Literal" handling. Plus a few new parsers: N-Triples (I co-created this), Turtle (I created this; there’s a theme here!) and RSS Tag Soup for the 9 flavours of RSS (I have nothing to do with this :) ) plus Atom. Plus a slew of serializers to match, the XML Writer being refactored to it’s own public API for this. It’s not really an RDF parser library anymore, it’s a web library with support for mapping between syntaxes and RDF triples.

Meanwhile, SAX2 and RDF/XML were still intertwined. Until this week in 2006. Finally I’ve pulled them apart which allows me to make a few neat things possible – the RSS tag soup parser has switched from using libxml-only xmlReader API to the separate SAX2 API so now you can do RSS and Atom with expat too. This also improves the Atom support as it can handle the type='xhtml' and type='xml' markup plus now uses the well-tested xml:base, QNames and Namespaces parts from Raptor. I hope that it’ll also be able to deal with other xml formats inside Atom, so I’m guessing RDF/XML in Atom will be possible. DOAP over Atom anyone?

However at this point I’m stopping as the part that has me stumped is how to best represent Atom in RDF triples. The Atom OWL work seems to be going slowly (Aside: also the web site acts very oddly to HTTP wget/curl requests). Mostly I’d like to have readers not have to care that it was Atom or RSS tag soup to begin with, so I’m thinking something like an Atom / RSS1.0 hybrid format.

Handwaving: here’s something I hand-edited:

<item rdf:about="http://example.org/blog/2006/04/01/stuff">
  <!-- the common bits -->
  <title>Stuff</title>
  <link>http://example.org/blog/2006/04/01/stuff</link>
  <description>&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;&gt;Content here&lt;/p&gt;&lt;/div&gt;</description>

  <-- The RSS 1.0 bits -->
  <dc:date>2006-04-02T20:19:00-08:00</dc:date>
  <content:encoded><![CDATA[<div xmlns="http://www.w3.org/1999/xhtml"><p>Content here</p></div>]]></content:encoded>

  <!-- The atom bits -->
  <atom:id>tag:example.org,2006:1234</atom:id>
  <atom:link rdf:parseType="Resource">
    <atom:link-href rdf:resource="http://example.org/blog/2006/04/01/stuff" />
    <atom:link-rel>alternate</atom:link-rel>
  </atom:link>
  <atom:updated>2006-04-02T20:19:00-08:00</atom:updated>
  <atom:content rdf:parseType="Resource">
    <atom:content-type>xhtml</atom:content-type>
    <atom:content-content rdf:parseType="Literal"><![CDATA[<div xmlns="http://www.w3.org/1999/xhtml"><p>Content here</p></div>]]></atom:content-content>
  </atom:content>
</item>

So why am I working on better Atom support in an RDF parser?

Because Atom 1.0 is the best way to encode data for blog entries. It’s long past time to ditch the horror that is RSS, the worst ambiguously defined XML format since OPML.