RDF Syntaxes 2.0

I’ve been diligently ignoring the RDF 2.0 threads on the semantic-web interest list, especially on Syntax since I’ve been there before (Modernising Semantic Web Markup). Firstly I’d endorse what Jeremy Carroll says about the features.

I think I’m qualified as an expert on RDF graph serializations / syntax since:

and I implemented all of the above plus GRDDL, RDFa (via librdfa), Atom and RSS*es, RDF/JSON, … in Raptor

People moan about RDF/XML and have for years. I even wrote down in great detail the flaws in Modernising Semantic Web Markup. Over all that time nobody has come up with a credible and complete XML syntax alternative that stuck, even myself. Let me summarize the ones I know:

  • TriX: had little takeup
  • RXR: ditto
  • GRIT: new, but flawed since it can only represent trees (no named bnodes)

The fundamental problem I think with using XML to write down graphs is:

People looking at XML expect they are looking at a hierarchical Tree.

So writing a Graph in an XML Tree is just going to always fail the simplicity test. This might come from using the XML DOM or looking at HTML, XHTML, but it’s pretty embedded in the mind.

Right now I’d dismiss any XML format for any “simple” or “obvious” way to write down RDF graphs that will be accepted by new users.

(Aside: There’s also a technical argument that no XML format can ever represent all RDF graphs since RDF allows Unicode codepoints that are not allowed in XML).

Now this isn’t a problem just with XML, it’s also true of other non-XML formats that are serial hierarchical documents. That means formats like JSON, which cannot even out-of-the-box represent anything that is not a tree, since it has no ID/REF mechanism.

Of course, apart having dealt with the RDF/XML I also invented Turtle (based on the N3 syntax, simplified) and although it’s a non-XML syntax, does seem to be in the sweet spot for users understanding it, without having the hierarchical document expectation. Yes, Turtle is close to JSON/python in syntax design space but this doesn’t seem to have been a problem.

So I’m happy with how Turtle turned out and that should be the focus of RDF syntax formats for users. It does need an update and I’ll probably work on that whether or not a new syntax is part of some future working group – I have a pile of fixes to go in. Adding named graphs (TRIG) might be the next step for this if it was a standard.

It may be there is a need for a better machine format, but please don’t mix them. Also, machines can read Turtle RDF :)

Consider this stream of conciousness RDF syntax thoughts as the basis of my position paper for the W3C RDF Next Steps workshop.

  • drewpca

    Trig isn't a standard? :)

    As a user, I'm quite happy with trig and ntriples, and I mostly wish there was more adoption of that ntriples+graph syntax.

  • http://www.dajobe.org/ Dave Beckett

    It's not as much of a standard as Turtle; in the sense of takeup and implementations. As I said at the end, adding named graphs to Turtle would be the next step, i.e. in the direction of trig.

  • Tom Passin

    I think the problem isn't so much that XML leads people to think in terms of trees. Basically, whatever the syntax, you will be writing it line by line, i.e., serially. As soon as there is anything like nesting, it's going to look like a tree. Turtle (for example), is generally written in a fairly flat manner, not with a large number of nested parts. So it doesn't have the feel of a tree. OTOH, it doesn't have the feel of a graph, either.

    Personally, I dealt with using RDF/XML by choosing a particular subset of the language. I chose a collection of constructions that made sense to me when I read them, and that also made it easier for me to write. The exercise of finding the right constructions also helped me to understand the resulting RDF graphs better.

    Of course, you can't be asking most people to go through a complex exercise like that when they just want to bang out some data!

  • thomaskappler

    Seeing that Turtle is well accepted in the Semantic Web world and JSON is very widespread and accepted in web development in general, and they are fairly close, it might be beneficial to define a standard mapping between the two. It would be fairly simple, mainly specifying how to express references in JSON. Then it would be very easy to plug components emitting RDF into existing web applications and frameworks. For instance, you could easily format RDF for human readers using any of the JavaScript UI libraries, or you could use CouchDB as a replicated RDF store without any modifications (though without SPARQL, of course).

  • Pingback: uberVU - social comments

  • http://www.dajobe.org/ Dave Beckett

    This has already been done (and I implemented them in Raptor). RDF/JSON at http://n2.talis.com/wiki/RDF_JSON_Specification has been implemented by a few people and is necessarily more verbose, since JSON doesn't have builtins for URIs or RDF literals (datatypes, languages). There's also a more JSON-triples which is essentially N-Triples in JSON, regular but very verbose.

  • thomaskappler

    Seeing that Turtle is well accepted in the Semantic Web world and JSON is very widespread and accepted in web development in general, and they are fairly close, it might be beneficial to define a standard mapping between the two. It would be fairly simple, mainly specifying how to express references in JSON. Then it would be very easy to plug components emitting RDF into existing web applications and frameworks. For instance, you could easily format RDF for human readers using any of the JavaScript UI libraries, or you could use CouchDB as a replicated RDF store without any modifications (though without SPARQL, of course).

  • Pingback: Tweets that mention RDF Syntaxes 2.0 » Dave Beckett – Journalblog -- Topsy.com

  • http://www.dajobe.org/ Dave Beckett

    This has already been done (and I implemented them in Raptor). RDF/JSON at http://n2.talis.com/wiki/RDF_JSON_Specification has been implemented by a few people and is necessarily more verbose, since JSON doesn't have builtins for URIs or RDF literals (datatypes, languages). There's also a more JSON-triples which is essentially N-Triples in JSON, regular but very verbose.