Monday, July 11, 2005

Structure standards and content standards

It's funny how related things seem to come in spurts in our lives. Or maybe it's just that once we notice something once, it's easier to notice again. In my standard metadata spiel, I, like many others, distinguish between structure standards that tell you what "fields" (for lack of a better term) to record, and content standards that tell you how to structure values in those fields. The latter can be either rules for structuring content or actual lists of permissible entries. It's an extremely useful distinction. Yet I've been noticing recently that it's frequently misunderstood, or that the distinction is implicit in a conversation rather than explicit.

One place this trend caught my eye recently was in a blog post by Christopher Harris on using LII's RSS feed to generate MARC records, and subsequent comments and posts by several people, including Karen Schneider of LII. Most of the ensuing discussion was about keeping the two data sources in sync, which of course is important to plan for. But I noted a conspicuous absence of content standards in the discussion. MARC records, of course, do not have to adhere to AACR2 practices. In fact, there are millions of non-AACR2 records (mostly created pre-AACR2 and never upgraded for practical reasons) in our catalogs. But today if one is creating a MARC record, it would be prudent to either use AACR2 or have a compelling argument against it. Yet neither of those options appeared in this discussion. Reading between the lines, I suspect the transformation should be reasonably straightforward, but one shouldn't have to read between the lines to know.

I suppose what I'm really saying here is that when talking about these sorts of activities, we need to completely define the problem to be solved before a solution can be determined. And that includes dealing with content standards in addition to structure standards. Explicitly. Knowing which standards (or lack of them) are in use in the source data and which are expected in the target schema. Planning for moving between them. This is an extremely interesting topic, and I personally would love to see more discussion about it.

Oh, and, for the record, I'm with Karen that one would want to be careful about putting lots of records for things like LII content into our MARC catalogs. My vision (imperfectly focused, unfortunately!) is that because the format (and the content standard that is normally used with it) doesn't describe this type of material well, and the systems in which we store and deliver our MARC records don't provide the sort of retrieval we might desire for these materials, our users would be better served by a layer on top of the catalog that also provides retrieval on other information sources better suited to describing these materials. This higher-level system would provide some basic searching but most importantly lead a user down into specific information sources that best meet his needs. We have lots of technologies and bits of applications that might be used for this purpose. I wonder what will emerge.

