Dave Beckett - Journalblog

Hacking the semantic linked data web

  • Recent posts

  • Follow me on twitter

Month: January, 2010

Raptor 1.4.21 released – Raptor 2 GIT work

I just released version 1.4.21 of my Raptor RDF parsing / serialising library to the world. This release is just bug fixes:

  • RDFa parser buffer management problems were fixed.
  • The Turtle parser and serializers now use QNames correctly as required by the specification.
  • The RDF/XML parser now resets correctly to detect duplicate rdf:IDs when a parser object is reused.
  • A few other minor bug and build fixes with made.
  • Fixed reported issues: 0000318, 0000319, 0000326, 0000331, 0000332 and 0000337

This is the first release since switching to GIT as the source control for the Redland libraries. The above release is on branch ‘raptor1′ in the new Redland GIT.

In parallel to this is the ongoing Raptor 2 ABI/API updating which is cleaning up 10 years of API and internal cruft. GIT is really helping speed up the ease of this work with the branching, staging/index and stash concepts it supports allowing false paths to be managed. The results can be seen on branch ‘master’ of raptor.

The updating is going well in the sense that make distcheck test suite passes, but there are still things to decide including:

  • Rename all raptor_CLASS_copy copy constructors to something else: either raptor_new_CLASS_from_CLASS (also used in raptor – Doh!) or to raptor_CLASS_addref which signifies better that it just adds a reference to the object, it’s a shallow copy, not a deep one.
  • Unify raptor_world, rasqal_world and librdf_world – which might help share classes between the libraries. Not sure if this is a good idea yet.
  • Add a graph term to the (subject, predicate, object) triple returned from parsing. I am probably going to do this.
  • Turn the raptor_locator object into a more of a log (like librdf_log) or exception object, with inner log/exceptions.
  • Improve the callback interface that passes error, warning etc. messages to user code.

I need to decide at what point to roll out an alpha release of Raptor 2, which will probably be numbered 1.9.0. Some of the above possibilities might be worth putting in a later alpha release.

This can all be seen in the GIT repository which includes instructions for checkout at git.librdf.org.

RDF Syntaxes 2.0

I’ve been diligently ignoring the RDF 2.0 threads on the semantic-web interest list, especially on Syntax since I’ve been there before (Modernising Semantic Web Markup). Firstly I’d endorse what Jeremy Carroll says about the features.

I think I’m qualified as an expert on RDF graph serializations / syntax since:

and I implemented all of the above plus GRDDL, RDFa (via librdfa), Atom and RSS*es, RDF/JSON, … in Raptor

People moan about RDF/XML and have for years. I even wrote down in great detail the flaws in Modernising Semantic Web Markup. Over all that time nobody has come up with a credible and complete XML syntax alternative that stuck, even myself. Let me summarize the ones I know:

  • TriX: had little takeup
  • RXR: ditto
  • GRIT: new, but flawed since it can only represent trees (no named bnodes)

The fundamental problem I think with using XML to write down graphs is:

People looking at XML expect they are looking at a hierarchical Tree.

So writing a Graph in an XML Tree is just going to always fail the simplicity test. This might come from using the XML DOM or looking at HTML, XHTML, but it’s pretty embedded in the mind.

Right now I’d dismiss any XML format for any “simple” or “obvious” way to write down RDF graphs that will be accepted by new users.

(Aside: There’s also a technical argument that no XML format can ever represent all RDF graphs since RDF allows Unicode codepoints that are not allowed in XML).

Now this isn’t a problem just with XML, it’s also true of other non-XML formats that are serial hierarchical documents. That means formats like JSON, which cannot even out-of-the-box represent anything that is not a tree, since it has no ID/REF mechanism.

Of course, apart having dealt with the RDF/XML I also invented Turtle (based on the N3 syntax, simplified) and although it’s a non-XML syntax, does seem to be in the sweet spot for users understanding it, without having the hierarchical document expectation. Yes, Turtle is close to JSON/python in syntax design space but this doesn’t seem to have been a problem.

So I’m happy with how Turtle turned out and that should be the focus of RDF syntax formats for users. It does need an update and I’ll probably work on that whether or not a new syntax is part of some future working group – I have a pile of fixes to go in. Adding named graphs (TRIG) might be the next step for this if it was a standard.

It may be there is a need for a better machine format, but please don’t mix them. Also, machines can read Turtle RDF :)

Consider this stream of conciousness RDF syntax thoughts as the basis of my position paper for the W3C RDF Next Steps workshop.