RDF at The Venice Project

Please note that I'm moving all my personal website pages to my new blog on wordpress.com; this page may be removed at some point in the near future.

So, the secrecy is slowly being dropped.

As should be obvious by the sudden influx of people blogging about our little project, the company blogging policy was made available. Such. A. Relief. I hate secrecy. There's only a few things (that interest me) which I can't talk about, apparently. Yay!

So let's mention something which might be interesting to my audience.

We make extensive use of RDF in different places. It all starts with a core RDFS/Owl schema that is used to capture various kinds of information (think FOAF +imdb+RSS+a lot more). I suspect some parts of the modelling work that was done here will make it into future standards for online video.

We have a custom distributed digital asset management system (or DAM), built around jena-with-postgres at the moment for storage and (CRUD-like) management off all that RDF-ized information over a REST protcol.

We convert from RDF to different specialized XML formats and back again. We convert from RDF to excel spreadsheets and back again (ugh). We have our jira instance hooked up to our RDF store. We convert RDF to other kinds of RDF. We have custom RDF visualization tools. We have custom RDF store crawlers that do efficient validation. We have RDF schemas that control the behavior of other distributed systems by adding intelligence to the core schema. We do triple timestamping. We do intelligent schema-driven indexing. We have custom libraries to make doing wicket-based, RDF-based web application development easier. Oh, we do RDF-based web applications. In short, we do more RDF than you can shake a stick at. So not a day goes by without some of our developers swearing about "RDF" or "metadata", since in many ways RDF still isn't exactly mature technology. But we'll fix the warts, and contribute those fixes back to the open source community.

In many ways, to me, the RDF part of our server architecture is much like WADI (I spent a year building the next version of it with asemantics before joining The Venice Project), with postgres instead of oracle, REST instead of SOAP, and a much less scary data model.

This RDF-based digital asset management (or DAM) seems like something everyone's doing right now. For example, Sesame has HTTPSail now, and the Simile people have Semantic Bank (I know of several more examples I'm not sure I can mention here).

Since everyone is inventing roughly the same wheel at the same time, and some people have re-invented it several times now, it is obvious it is about time for an open source project that does RDF-over-HTTP, properly. I've been talking to various people about this for a while now, and a bunch of us are almost ready to approach the Apache Incubator with a proposal for a project to build a "sparql endpoint". And the venice project will be donating some code (and developer time!) to seed this effort. Hopefully we will go from annoyingly secretive to actively open (and open source) in the scope of a few weeks.

Now, back to work. Or rather, lunch, in London, with some people on our content team. See if the sushi is even better here than it was in New York.