Monday, May 29, 2006

An RDF Revelation

While doing some reading recently, I had an RDF revalation. I've long felt I didn't really get RDF. This time, the parts that sunk in made a bit more sense. I'm not a convert in this particular religious war, but I do feel like I now understand both sides a bit better.

I've read the W3C RDF Primer before; several times, I think. The first thing that struck me this time was a simple fact I know I'd read before but that I'd forgotten--that an object can be a either a URIref or a literal (a URI referencing a definition elsewhere, or a string containing a human-readable value). This means the strict machine-readable definitions of things RDF strives to achieve is potentially only half there--only the predicate (relationship between the subject and object) is expected to be a reference to a presumably shared source. I assume this option exists for ease of use. Certainly building up an infrastructure that allows for all values to be referenced rather than declared represents unreasonable startup time. This sort of thing is better done in an evolutionary fashion rather than forcing it to happen at the start; a reasonable decision on the part of RDF.

RDF contains some other constructs to make things easier, for example, blank nodes to group a set of nodes (or, in the words of the primer, provide "the necessary connectivity"). Blank nodes are a further feature that allow lack of formal identification of entities. The primer discusses a case using a blank node to describe a person, rather than relying on a URI such as an email address as an identifier for that person. A convenient feature, certainly, but also a step away from the formal structures envisioned in Semantic Web Nirvana.

So now I'm looking at the whole XML vs. RDF discussion much more as a continuum rather than opposing philosophical perspectives. The general tenor of RDF is that it expects everything to be declared in an extremely formal manner. But there are reasonable exceptions to that model, and RDF seems to make them. I'd argue now that both RDF and XML represent practical compromises. Both strive for interoperability in their own way. It's just a question of degree whether one expects a metadata creator to check existing vocabularies, sources, and models for individual concepts (RDF-ish) or for representing entire resources (XML-ish). I see the value of RDF for use in unpredictable environments. Yet I'm still not convinced our library applications are ready for it yet. The reality is that libraries are still for the most part sharing metadata in highly controlled environments where some human semantic understanding is present in the chain somewhere (even in big aggregations like OAIster). (Of course, if we had more machine-understandable data, that human step would be less essential...)

I'm a big champion of two-way sharing of metadata between library sources and the "outside world." I just don't think the applications that can make use of RDF metadata for this purpose are yet mature enough to make it worth the extra development time on my end. And, again, the reality is that it really would take significant extra development time for me. The metadata standards libraries use are overwhelmingly XML-based rather than RDF-based. XML tools are much more mature than RDF tools. I fully understand the power of the RDF vision. But this is one area I just can't be the one pushing the envelope to get there.


Anonymous said...

Hmm, I'm not sure I understand the idea of an 'XML vs RDF debate'. Although I'm fairly new to both XML and RDF.

I mean, isn't the most common way to encode RDF _as_ XML? XML is a transmission/encoding format, which a variety of metadata vocabularies ( can be encoded in (as well as a variety of other structured data, not just metadata vocabularies!). RDF is a metadata vocabulary (of a certain generalized type, meant to accomplish certain fairly ambitious goals in a machine environment) which can be encoded in various ways, XML being one of them.

I don't understand the idea of an XML vs. RDF debate. It sounds to me like talking about a "DC vs. XML debate", which wouldn't make any more sense. Can anyone point me to this debate being engaged in online, so I can understand what the debate is about?

RDF seems awfully useful to me. I suspect that it will continue to catch on, and eventually be a well adopted standard. I hope that the library world continues to get more engaged in RDF, to help it develop, and to not be left behind if/when it does reach the tipping point. But we certainly shouldn't put all our eggs in one basket.

Either way, XML is obviously here to stay, RDF or no. RDF will typically be expressed in RDF.

Am I missing something?

Just discovered your blog, great blog, thanks.


Anonymous said...

PS: I probably should have called RDF a 'metadata language' rather than 'vocabulary'. Somewhat more accurate. Consensus terminology for talking about these things isn't quite there.

Jenn Riley said...

I agree with you it's not necessarily an either/or thing. RDF definitely has an XML serialization available for use.

But in the library world, we tend to record metadata (and other stuff) in XML using XML languages defined by XML DTDs and/or W3C XML Schemas (RelaxNG hasn't caught on too much in the library world yet). We don't tend to use RDF semantics because in general (waaaaay oversimiplifying here, calm down everyone) library types don't see the extra work to move to RDF with the greater degree of explicitness it entails as worth the investment. XML with W3C XML Schema lets us do some interesting stuff. RDF in XML (or any other encoding of RDF) would let us do more, but we're not sure if taking the leap will pay off.

Anonymous said...

Isn't there a DTD and/or W3 Schema for RDF as serialized/encoded in XML?

But I see what you're saying that RDF hasn't caught on in the library world.

Still seems to me like "to use RDF or not" is a better description than "RDF or XML".