In January 2007 I started playing with the Flickr API – the HTTP-based web service that lets you manipulate Flickr. At that point I was using it to play with machine tags and to see how the most popular Web Service API worked, especially in the area of authentication. This was in the days before OAuth if you can remember that far back.
I started with a test program in C that called libcurl and did some of the signing and parameter marshaling of the
flickr.photos.getInfo
call which is where all the juicy metadata about photos is. I started thinking about ways to map photo metadata into RDF for manipulating and querying; there is an existing Perl Flickr RDF mapping but it didn’t contain everything. This state of sources was useful; it contained a small library with the one API method plus a command-line utility to call it. Since I was using libCurl to call Flickr, I named it Flickcurl. Also CFlickr was taken! (Flickcurl also uses libxml but flickcurlibxml is just nuts).
Apart from playing with photo metadata I also had some personal reasons to make something new. I wanted a lighter weight and less formal project than the way I had been building the Redland RDF Libraries. More of it compiles, ship it model and less of the unit tests, test cases and continual make check, worrying about portability approach. Maybe more fun would be a way to put it. I’m happiest as a free software / open source software tool-builder and at this point in 2007 I was spending a lot of time at work doing non-coding things such as designing specifications and doing technical leadership and the chance to work on some different code now and then was appealing to counterpoint the work stuff.
Redland is a set of libraries that have been growing since mid-2000 with more and more features as the semantic web technology stack grows so at any point in time there is no clear end state. For this project I wanted a clear goal to reach so I could be clearly done at some point. This is possible with the Flickr API since there are at any time a finite number of API calls (something like 100) so progress can be measured… although Flickr did add API calls while I was working on it. The result was I made a Flickcurl API coverage page with embedded API changelog (automatically generated from source code comments).
Flickcurl 0.1 was “released” 2007-01-21 although I didn’t announce it to anyone at that point. It was more of a tarball than an actual release.
One more thing I wanted to do was to experiment with different ways to tell people about software, compared to the ways I as using with Redland which was mostly email based but also via SourceForge and Freshmeat. So for Flickcurl I tried a bunch of different ways:
That was kind of fun, and I also followed a similar light weight process with Triplr but that’s another story. I think caring less worked out fine; people did use it and submit patches. Right now I still use the Flickr mailing list, API group, and freshmeat project.
As the library headed towards 100% of the API and beyond it did get a bit more formal and I imported what I think are the best practices from the Redland libraries:
- objects in C design
- always refactoring the source code: refactoring is not just for dynamic languages
- source code docu-comments generating an HTML API reference via gtk-doc
- folding in portability fixes
- make it work with optional libraries for extra functionality (Raptor in this case, to allow serialising to other RDF syntaxes)
- built in portable ANSI C
- taking care about memory leaks with valgrind
- comes with a utility program able to exercise the entire API (called
flickcurl)
- Debian packages (created by somebody else, yay!)
- manual pages for the command line utilities
The general aim was to get 100% of the Flickr API done by the end of 2007 and I actually reached it for Flickcurl 1.0 on 2008-01-12 which was pretty close.
So right now the library has gone beyond 1.0; the latest release is Flickcurl 1.4 which was released last Tuesday 24th June (see release notes) which primarily added video support but I also updated the photo metadata mapping to RDF by adding a serializer class for abstracting the photo-to-triples process.
The RDF triple mappings is something that has always been there but not part of the core library. It could be optionally used inside Raptor to automatically read Flickr photo URIs as RDF data sources. I doubt it’ll ever be presented inside a public web service like Triplr since it would require passing in Flickr API authentication tokens and user credentials.
The RDF triples mapping I’ve made for the Flickr photo metadata has mixture of vocabularies which are in 3 buckets:
- Existing Vocabularies: well known RDF schemas (class and properties) that have been developed over many years by multiple people and organisations, sometimes with a lot of formality.
- Flickr-specific Vocabularies: vocabularies I made up mostly for Flickr video and places API terms.
- Machine Tag Vocabularies: I made them up using machinetags.org/ns URIs as a root for the namespaces associated with the vocabularies. The terms in the vocabularies come from how people used machine tags on Flickr and are not always defined.
This is a range of what might be called semantic web heavy to light although there is absolutely nothing wrong with mixing things up if you are not worried about inference. This is OK! I should probably put some html/schema documents at the vocabularies and get the redirects and all that # and / business sorted so that the linked data works out properly but what I have now is just a start and I’d be interested to see what people think. There are more details of the vocabularies and terms I’m using in the Flickcurl 1.4 release notes although I should probably add vocabulary information to the documentation too.
That’s all for now but I’ll expand some more in another post about the Flickr API itself and my experience with it and impressions of it as a both a software developer and HTTP Web Service designer.
Raptor RDF Syntax Library V2 beta 1
Today I released the first beta version of Raptor 2. This is the culmination of about 9 months work refactoring the Raptor 1 codebase. In hindsight, I should have done this years ago, but I knew it would be a lot of work, and it was.
The reasoning behind doing this is multi-fold, but basically the code had a lot of cruft and bad design choices that couldn’t be removed without breaking the APIs in lots of ways, and at some point it’s easier to just do it all at once, and that’s where we are now.
Cruft meant removing stuff deprecated for a long time but also renaming all the functions to follow the same “objects in C” style used throughout Redland’s libraries which has standard naming forms:
raptor_class_method()raptor_new_class()(core constructor or 1 arg constructor) andraptor_new_class_from_extras()raptor_class_copy()raptor_free_class()The major addition was a
raptor_worldobject that is used as a single object to hold on to all shared resources and configuration. This was a design pattern I put in librdf and Rasqal but for some reason, never considered it for raptor. This turned out to be a mistake since I had to then pass around a lot of parameters and configuration to individual object instances, more than was really needed. Examples of this include the error handling which added two parameters to several constructors. The error handling, now expanded to a general log mechanism after librdf’s handles multiple structured log record types and the logging policy is once-per-world.The addition of the world object meant that each constructor for an object in raptor now takes that object, so it can get access to the shared configuration and resources. That itself meant the change was extensive, broad in scope. The single place to manage resources means it’s easier to ensure proper cleanup and deal with library-wide issues.
One other pain point was Raptor’s simplistic (but functional!) URI class. It manipulated URIs as plain old C strings (
char*). I knew from building librdf, that this could be more efficient by interning the strings so a URI for a particular string is held only once, and reference counted. I used the already built raptor AVL-Tree to implement it, and as a bonus, moved that AVL Tree to the public API, so it can be reused (Rasqal has a copy of the implementation). The resulting reference-counted URIs mean that after URI construction, comparison and copying are very cheap – and given that this is RDF, those are done a lot. The old URI code also had a swappable implementation which added a lot of complexity to the code and that has gone now, since the new implementation is more sophisticated. There is probably more work that can be done here to make this URI work better, such as caching the URI structure so that it’s quicker to generate relative URIs. Also one day I should really validate that all the URIs built are legal to the syntax.Another long term problem was the triple itself, which I had called ‘statement’ way back when I was creating it. Unfortunately a
raptor_statementhad hard-coded the RDF specifics – the subject can only be URI or blank node, predicate can only be a URI etc. That meant the code was twisty. That has been replaced by an array of 3 or 4 raptor terms (URI or blank node or literal) so it can handle both triples, quads and any possible extension beyond RDF (2004), although today none of the current parsers or serializers expect non-RDF statements. That change also made a lot of the internal code simpler to understand and quicker. The RDF terms were also introduced in a reference count manner, along with adding reference counting to the statements, it meant that passing triples around which used to involve a lot of copying, is now a simple integer increment of the reference. More speed!That sorted out the fundamentals of statements, terms and URIs and changed pretty much every piece of code that touched them in all the parsers and serializers and core code.
There were a few pieces of new work added – two new serializers and one new parser. Two of those were written by Nicholas J Humfrey who is now a core committer.
I’d also like to call out thanks to Lauri Aalto for keeping raptor, rasqal and librdf relatively buildable while I was refactoring and breaking things. He wrote the code to make Rasqal and librdf build and work with raptor V1 and V2 at the same time.
Other work included updating all the reference documentation, tutorials, examples and sundry documentation for the new APIs including admin code to automate some of the documentation so it always included accurate details about formats.
There is lots more that changed in detail, listed in the Raptor 1.9.0 Release Notes, help on upgrading and there’s even a perl script
docs/upgrade-script.plthrown in (generated by another perl script!) that may help with applying the changes. The reference manual contains a full reference on changes between raptor 1.4.21 and 1.9.0 in the form of old / new mappings with explanations.I know that Raptor 2 is not going to place Raptor 1 for applications for some time, so this is a separately installed library with a new location for the header file and a new shared library base. However, once this hits 2.0.0 it’ll be a dependency of Rasqal and librdf.
Summary of release:
raptor_class_method()form.raptor_worldargument.0000357, 0000361, 0000369, 0000370, 0000373 and 0000379
It turns out that after all that, the resulting libraries for raptor 2 are actually 4% smaller than raptor 1 when installed (Debian, i386):
The gzipped tarball itself is as small as raptor 1.4.17 from 2008!
Get it at http://download.librdf.org/source/raptor2-1.9.0.tar.gz
PS The source code control has also moved to GIT and hosted at GitHub.