Dave Beckett - Journalblog

Hacking the semantic linked data web

  • Recent posts

  • Follow me on twitter

Month: October, 2004

Rasqal 0.9.3 now with SPARQL

Today I announced the release of Rasqal 0.9.3 which follows a day after Raptor 1.4.0 was announced for good reason, it needs it.

The main reasons for this release are firstly continuing the license change to LGPL 2.1/Apache 2.0 for the Redland libraries but more importantly, to add initial support for the draft SPARQL Query Language for RDF which was published as a first working draft 2004-10-12 by the RDF Data Access Working Group (DAWG). This is by no means a complete implementation of the entire language, it parses all the syntax and has an engine that also executes the core language, approximating to the RDQL support already in Rasqal. The Rasqal SPARQL To Do list covers all the parts not implemented.

So, in the same style as I announced Raptor 1.4.0, here’s what you can do on the command line:

$ roqet -i sparql \
   -s http://www.w3.org/2001/sw/DataAccess/tests/data/simple/data-01.n3 \

http://www.w3.org/2001/sw/DataAccess/tests/data/simple/dawg-tp-01.rq

result: [p=uri<http://example.org/data/p>, q=uri<http://example.org/data/v1>]
result: [p=uri<http://example.org/data/p>, q=uri<http://example.org/data/v2>]

The command line is a bit verbose (newlines added for clarity) but it really does execute the SPARQL query and take the data from the web. The DAWG is working on an XML output format.

And of course, the language APIs have not been forgotten. Redland already provides Perl, Python and C# interfaces to querying so for variety, here’s the C#:

QueryResults qr = model.Execute (new Query (query_string) );
while (!qr.End) {
  Hashtable result = (Hashtable) qr.Current;
  Console.WriteLine("Result:");
  IDictionaryEnumerator enumerator = result.GetEnumerator ();
  while (enumerator.MoveNext ())
    Console.WriteLine("  {0} = {1}", enumerator.Key, enumerator.Value);
  qr.MoveNext();
}

(This can likely be made more C#-idiomatic)

Raptor 1.4.0 – Serializing RDF

Today I announced the release of Raptor 1.4.0 which actually adds a new major part of functionality to Raptor – serializing. Although this is a major change but didn’t break older APIs so the temptation to call it raptor2 was only slight.

This means Raptor can now do both parsing: syntax to RDF triples and serializing: RDF triples to syntax. All the icky syntax details now available in one library :) I’ve been able to delete code from Redland which is great.

Or in terms of the command line rapper utility:

rapper -i rdfxml -o ntriples file.rdf > file.nt
rapper -i ntriples -o rdfxml file.nt > file2.rdf

which won’t give the exact same file out, but it will encode the same triples. (the first example above is actually the default with no options)

Ob Metacomment: in an email I sent to redland-dev answering a question earlier this week I said I’d get round to doing serializing sometime I had a free weekend. As it happens, it took only a couple of lunchtimes this week and a few evenings to get it going. The rest of the evenings and this weekend were needed just for testing that it worked with Redland and Redland Bindings and then more testing and release management of Raptor. So right now, the CVS Redland called via the python binding can do:

$ python
>>> import RDF
>>> model=RDF.Model(storage=RDF.MemoryStorage())
>>> model.load("http://planetrdf.com/index.rdf")
>>> print len(model.to_string(name="ntriples"))
174836
>>> print len(model.to_string())
206002

(and for the Pythoneers reading, yes I overload __str__ so print len(str(model) works too)

Munging Planet RDF

Sam Ruby says in his slide Munging from his slides on the pitfalls around Unicode, XML and HTTP:
Planet RDF will take HTML and run it through a iso-8859-1 to utf-8 conversion
This is not quite correct.

The code behind PlanetRDF uses the source blogroll to get the RSS feed URIs. These are fetched and RSS parsed using the Ultra-liberal RSS parser giving Unicode inside Python. This data is used to create a skeleton html document in UTF-8 which is passed to tidy to try to fix HTML escaping and tagging messes. Tidy is told to read and write UTF-8. The aggregation then is performed and the result is a new RDF/XML (RSS1.0) feed in UTF-8 which is then XSLTed into XHTML in UTF-8. There is no explicit transcoding. If there is a problem, it’ll be at the first RSS stage.

There are sometimes encoding errors in titles in the main page body which is due to python problems understanding when tidy emits UTF-8 encoded bytes and python attempts to read them as ASCII. The right hand side is always correct, since it is all done in RDF from the source blogroll, no munging.

I guess it’s time to junk the “Ultra-liberal” parser and replace it with a real one and as all PlanetRDF feeds are RSS 1.0, not RSS tag soup, we can use an RDF/XML parser. At that point PlanetRDF will be triples all the way down :)

More detail of how PlanetRDF works was given in Planet Blog by Edd Dumbill.

Redland and RDF inside Ubuntu

I had a nice surprise at the weekend while playing with a daily CD of the new debian commercial distribution Ubuntu. As I watched the packages being installed and wooshing up the screen, I was surprised to see libraptor1 and then librdf0 fly by. It turns out that because Ubuntu includes a full python developer’s environment, it includes all Debian unstable’s python packages, which includes Redland‘s Python API in package python2.3-librdf.

Nice!

This beats Redland libraries appearing on SUSE or other linuxes, since Redland is installed by default and is provided on CD1.

To celebrate this I’ve made updated Redland Ubuntu testing packages including rasqal for RDF query, to go along with the already existing Redland Debian unstable packages (also in Debian’s archive on 10 architectures).