Sunday, March 01, 2009

Google vs. Semantic Web

On a number of fronts recently I've been thinking a bunch about RDF, the DCMI Abstract Model, and the Semantic Web, all with an eye towards understanding these things more than I have in the past. I think I've made some progress, although I can't claim to fully grok any of these yet. One thing does occur to me, although it's probably a gross oversimplification. The difference in the Semantic Web/RDF approach from the, say, Google approach is this: is the robustness in the data or is it in the system?

The Semantic Web (et al) would like the data to be self-explanatory, to say itself explicitly what it is it is describing and with explicit reference to all the properties used in the description. The opposite end of the spectrum is systems like Google which assume some kind of intelligence went into the creation of the data but doesn't expect the data itself to explicitly manifest it. The approach of these systems is to reverse engineer that data, getting at the human intelligence that created it in the first place.

The difference is one of who is expected to to the work - the sytem encoding the data in the first place (Semantic Web approach) or the system decoding the data for use in a specific application. Both obviously present challenges, and it's not clear to me at this point which will "win." Maybe the "good enough and a person can go the last bit" approach really is appropriate - no system can be perfect! Or maybe as information systems evolve our standards for the performance of these systems will be raised to a degree where self-describing data is demanded. As a moderate, I guess I think both will probably be necessary for different uses. But which way will the library community go? Can we afford to have feet in both camps into the future?

1 comment:

Casey Mullin said...

I believe the Semantic Web, and Google's evolving efforts are not mutually exclusive. The vision of the SW, IMHO is to improve data's ability to describe itself and its relationship to other entities. This concept is not new, either to the web as a whole or to libraries' corner of it. Documents have always had creators, subjects and relationships to other documents. Google's ability to bring this out with their own brand of search engine magic has galvinized its prominence in the public consciousness and, by extension, that of library users (including librarians)!

Google's efforts to, as you say, reverse engineer data will not be co-opted by the new world order that SW technology will germinate. Rather, as the architecture of the data improves, Google's role will evolve apace. The framers of the SW envision semi-intelligent "agents" who will navigate data relationships and perform reasoning in order to accomodate a user's complex request. This is not a far cry from what Google does now; the SW will only help Google do it better. As for robust metadata for library resources, semantic search engines will not circumvent the need for will presuppose its existence.