Dave Beckett - Journalblog

Hacking the semantic linked data web

  • Recent posts

  • Follow me on twitter

Tag: rdf

RDF Syntaxes 2.0

I’ve been diligently ignoring the RDF 2.0 threads on the semantic-web interest list, especially on Syntax since I’ve been there before (Modernising Semantic Web Markup). Firstly I’d endorse what Jeremy Carroll says about the features.

I think I’m qualified as an expert on RDF graph serializations / syntax since:

and I implemented all of the above plus GRDDL, RDFa (via librdfa), Atom and RSS*es, RDF/JSON, … in Raptor

People moan about RDF/XML and have for years. I even wrote down in great detail the flaws in Modernising Semantic Web Markup. Over all that time nobody has come up with a credible and complete XML syntax alternative that stuck, even myself. Let me summarize the ones I know:

  • TriX: had little takeup
  • RXR: ditto
  • GRIT: new, but flawed since it can only represent trees (no named bnodes)

The fundamental problem I think with using XML to write down graphs is:

People looking at XML expect they are looking at a hierarchical Tree.

So writing a Graph in an XML Tree is just going to always fail the simplicity test. This might come from using the XML DOM or looking at HTML, XHTML, but it’s pretty embedded in the mind.

Right now I’d dismiss any XML format for any “simple” or “obvious” way to write down RDF graphs that will be accepted by new users.

(Aside: There’s also a technical argument that no XML format can ever represent all RDF graphs since RDF allows Unicode codepoints that are not allowed in XML).

Now this isn’t a problem just with XML, it’s also true of other non-XML formats that are serial hierarchical documents. That means formats like JSON, which cannot even out-of-the-box represent anything that is not a tree, since it has no ID/REF mechanism.

Of course, apart having dealt with the RDF/XML I also invented Turtle (based on the N3 syntax, simplified) and although it’s a non-XML syntax, does seem to be in the sweet spot for users understanding it, without having the hierarchical document expectation. Yes, Turtle is close to JSON/python in syntax design space but this doesn’t seem to have been a problem.

So I’m happy with how Turtle turned out and that should be the focus of RDF syntax formats for users. It does need an update and I’ll probably work on that whether or not a new syntax is part of some future working group – I have a pile of fixes to go in. Adding named graphs (TRIG) might be the next step for this if it was a standard.

It may be there is a need for a better machine format, but please don’t mix them. Also, machines can read Turtle RDF :)

Consider this stream of conciousness RDF syntax thoughts as the basis of my position paper for the W3C RDF Next Steps workshop.

Raptor 1.4.20 RDF syntax library released

Released Raptor 1.4.20 as a bug fix release – no ABI or API changes but fixes for wrong-to-spec bugs, crashes and performance. Raptor 2 will contain ABI/API changes and have new features when it is released – no ETA. My main development focus returns to Rasqal, it’s new query engine and full SPARQL 1.0 support, which is coming along well.

The main changes in the new Raptor version are:

  • Turtle serializing performance improvement by Chris Cannam
  • librdfa RDFa parser updates to fix empty datatype, xml:lang and 1-char prefixes by Manu Sporny
  • Fix a crash when the GRDDL parser reported errors
  • Enable large file support for 32-bit systems
  • Several resilience improvements by Lauri Aalto
  • Other minor portability and bug fixes
  • Fixed reported issues: 0000306 0000307 0000310 and 0000312.

See the Raptor 1.4.20 Release Notes for the full details of the changes.

Download it from: raptor-1.4.20.tar.gz

Redland RDF Libraries 1.0.8 released

Yesterday I released updates for three of the four Redland packages:

This is a major release due to an ABI/API change in Rasqal that is needed for the future. It only makes sense to upgrade all of these at the same time since they form a dependency chain. So rather than three announcements, I made one and shipped Fedora Cora 7 RPMs and Debian deb binaries too.

I also released a new version of the fourth package: Raptor RDF Parser and Serializer Library (raptor) 1.4.18 a few weeks ago with new RDFa parser and atom-triples serializer, but that’s independent (details below). Yesterday I added FC7 RPMs.

The individual package release notes are as follows:

Rasqal RDF Query Library 0.9.16

WARNING: ABI AND API CHANGED in this release. Rasqal 0.9.16 is incompatible with 0.9.15.

  • Added a rasqal_world object used for all constructor functions
  • Removed deprecated functions and macros
  • Fixed some memory leaks and made some low-memory resilience fixes
  • Query result sets can be read/written from sparql XML results format
  • Improved error syntax error reporting

See the Rasqal 0.9.16 Release Notes for the full details of the changes.

Redland RDF Library 1.0.8
  • Updated to use Rasqal 0.9.16 (from 0.9.15)
  • Updated to use Raptor 1.4.18 (from 1.4.16)
  • Added a ‘trees’ indexed in-memory storage
  • Improvements to low-memory and other failures of resource allocation
  • API additions and changes to concepts, parser and serializer classes.
  • Fixed Issue 0000256

NOTE: The next release of redland will NOT include raptor and rasqal in the tarball, they will be separate download dependencies.

See the Redland 1.0.8 Release Notes for the full details of the changes.

Redland RDF Language Bindings Version 1.0.8.1
  • Synchronise with Redland 1.0.8
  • Configuration now prefers pkg-config which helps OSX linking
  • Minor fixes to perl, python and ruby bindings.

See the Redland Bindings 1.0.8.1 Release Notes for the full details of the changes.

Raptor RDF Parser and Serializer Library 1.4.18
  • Added an RDFa parser using an embedded version of librdfa by Manu Sporny of Digital Bazaar.
  • Added an Atom 1.0 (RFC 4287) serializer with several output parameters.
  • Improved RSS 1.0 serializer functionality and resilience.
  • Added new API methods for qname, serializer, sequence and XML writer classes.
  • Many other fixes and resilience improvements.
  • Fixed reported issues: 0000186 and 0000255.

See the Raptor 1.4.18 Release Notes for the full details of the changes.

The Flickcurl Story

In January 2007 I started playing with the Flickr API – the HTTP-based web service that lets you manipulate Flickr. At that point I was using it to play with machine tags and to see how the most popular Web Service API worked, especially in the area of authentication. This was in the days before OAuth if you can remember that far back.

I started with a test program in C that called libcurl and did some of the signing and parameter marshaling of the flickr.photos.getInfo call which is where all the juicy metadata about photos is. I started thinking about ways to map photo metadata into RDF for manipulating and querying; there is an existing Perl Flickr RDF mapping but it didn’t contain everything. This state of sources was useful; it contained a small library with the one API method plus a command-line utility to call it. Since I was using libCurl to call Flickr, I named it Flickcurl. Also CFlickr was taken! (Flickcurl also uses libxml but flickcurlibxml is just nuts).

Apart from playing with photo metadata I also had some personal reasons to make something new. I wanted a lighter weight and less formal project than the way I had been building the Redland RDF Libraries. More of it compiles, ship it model and less of the unit tests, test cases and continual make check, worrying about portability approach. Maybe more fun would be a way to put it. I’m happiest as a free software / open source software tool-builder and at this point in 2007 I was spending a lot of time at work doing non-coding things such as designing specifications and doing technical leadership and the chance to work on some different code now and then was appealing to counterpoint the work stuff.

Redland is a set of libraries that have been growing since mid-2000 with more and more features as the semantic web technology stack grows so at any point in time there is no clear end state. For this project I wanted a clear goal to reach so I could be clearly done at some point. This is possible with the Flickr API since there are at any time a finite number of API calls (something like 100) so progress can be measured… although Flickr did add API calls while I was working on it. The result was I made a Flickcurl API coverage page with embedded API changelog (automatically generated from source code comments).

Flickcurl 0.1 was “released” 2007-01-21 although I didn’t announce it to anyone at that point. It was more of a tarball than an actual release.

One more thing I wanted to do was to experiment with different ways to tell people about software, compared to the ways I as using with Redland which was mostly email based but also via SourceForge and Freshmeat. So for Flickcurl I tried a bunch of different ways:

That was kind of fun, and I also followed a similar light weight process with Triplr but that’s another story. I think caring less worked out fine; people did use it and submit patches. Right now I still use the Flickr mailing list, API group, and freshmeat project.

As the library headed towards 100% of the API and beyond it did get a bit more formal and I imported what I think are the best practices from the Redland libraries:

  • objects in C design
  • always refactoring the source code: refactoring is not just for dynamic languages
  • source code docu-comments generating an HTML API reference via gtk-doc
  • folding in portability fixes
  • make it work with optional libraries for extra functionality (Raptor in this case, to allow serialising to other RDF syntaxes)
  • built in portable ANSI C
  • taking care about memory leaks with valgrind
  • comes with a utility program able to exercise the entire API (called flickcurl)
  • Debian packages (created by somebody else, yay!)
  • manual pages for the command line utilities

The general aim was to get 100% of the Flickr API done by the end of 2007 and I actually reached it for Flickcurl 1.0 on 2008-01-12 which was pretty close.

So right now the library has gone beyond 1.0; the latest release is Flickcurl 1.4 which was released last Tuesday 24th June (see release notes) which primarily added video support but I also updated the photo metadata mapping to RDF by adding a serializer class for abstracting the photo-to-triples process.

The RDF triple mappings is something that has always been there but not part of the core library. It could be optionally used inside Raptor to automatically read Flickr photo URIs as RDF data sources. I doubt it’ll ever be presented inside a public web service like Triplr since it would require passing in Flickr API authentication tokens and user credentials.

The RDF triples mapping I’ve made for the Flickr photo metadata has mixture of vocabularies which are in 3 buckets:

  1. Existing Vocabularies: well known RDF schemas (class and properties) that have been developed over many years by multiple people and organisations, sometimes with a lot of formality.
  2. Flickr-specific Vocabularies: vocabularies I made up mostly for Flickr video and places API terms.
  3. Machine Tag Vocabularies: I made them up using machinetags.org/ns URIs as a root for the namespaces associated with the vocabularies. The terms in the vocabularies come from how people used machine tags on Flickr and are not always defined.

This is a range of what might be called semantic web heavy to light although there is absolutely nothing wrong with mixing things up if you are not worried about inference. This is OK! I should probably put some html/schema documents at the vocabularies and get the redirects and all that # and / business sorted so that the linked data works out properly but what I have now is just a start and I’d be interested to see what people think. There are more details of the vocabularies and terms I’m using in the Flickcurl 1.4 release notes although I should probably add vocabulary information to the documentation too.

That’s all for now but I’ll expand some more in another post about the Flickr API itself and my experience with it and impressions of it as a both a software developer and HTTP Web Service designer.

Raptor Web Library 1.4.17

Last week I released version 1.4.17 of my Raptor C library (release notes) but in the 38 releases over the 7 years or so since I started building it, there’s more to see than just triples.

I/O stream API
Abstracts from specifics of I/O so that you can read/from write to any of a string/memory buffer, a file, a C FILE* or any custom user data structure. Similar to C++ idea of stream. This allows lazy evaluation of I/O and using language-specific I/O routines such as PerlIO (potentially, not yet!). The read abstraction is new in 1.4.17.
Sequence API
For small lists of data items that can grow at either end. This allow small lists, stacks and queues to be made when needed, and iterated over. Do not use for large lists! (Something else is coming along to handle this soon.)
Stringbuffer API
Provides a growing string class, similar to Java’s StringBuffer, which can be added to many times and then the result obtained (with no further changes allowed). This is handy for constructing formatted output, queries using parameters and values which is hard in C without entering the world of char buf[big enough]. The stringbuffer API tries hard to minimise copying.
Unicode API
For a small part of Unicode that is used near XML; UTF-8 encoding and decoding and Normal Form C validation. Nearby it provides XML 1.0 and 1.1 name-checking/validation functions.
URI API
Construction of URIs, resolving of relative URIs and turning absolute URIs into relative. It handles the RFC3986 URI spec updates.
Web fetch API
Retrieving of content from a URI in chunks of bytes. It works by calling curl, libxml or BSD libfetch underneath. It also handles redirections and adjusting any base URIs for 30x responses, returning content type headers early so that content negotiation can be done and customisation of the requests to send appropriate User-Agent and Cache-Control headers (latter, new in 1.4.17)
XML with Namespaces Streaming (SAX2) Reader API (newly public in 1.4.17).
Streaming reading of large XML documents over either libxml2 or expat (build time choice with configure). Expat out of the box does not support namespaces and XML QNames, so Raptor adds those and hides various library differences and some bugs. The XML API also provides full XML Base (xml:base) support.
XML Writer API
Write XML elements as Canonical XML output.

Some of the above are large pieces of work and some are small, but they are all solid and many have been used for multiple years in production. These turned out to be handy datatype classes for web programming and I needed them since RDF is built with web technology.

The bonus is that all of the above is used to provide the signature features of Raptor: RDF Parsing – turning syntax into triples and RDF Serializing – turning triples into syntax. Raptor now parses 7 syntaxes (GRDDL, N-Triples, RDF/XML, RSSes, Atom, TRiG, Turtle) and serializes to 10 (Atom 1.0, GraphViz DOT, JSON * 2, N-Triples, RDF-XML * 3, RSS 1.0, Turtle). The JSON outputs are new for 1.4.17

So although Raptor deals with all the RDF syntax details, it does a lot more. But I’m not changing the name!