Monday, January 28, 2008

Metadata interoperability

I'm not especially handy. (Get up, those of you who just fell on the floor laughing at the understatement of the century. Stop it. Right now. I know who you are.) Inspired by a recent minor home repair (which would probably be a trivial repair for you normal folks!) involving a screwdriver (flathead, if you were wondering, I'm not quite that inept), I've been thinking about how to explain metadata interoperability in terms of tools.

I've long held that interoperability by prescribing a single way of doing things is unsustainable, even at a relatively small scale, and it seems to me those sets of many-sized gadgets can show us a path forward. Wrenches, bolts, sockets and the like are not all the same. Rather, a bolt is chosen based on what it needs to do - what functions it needs to support and the environment in which it needs to fit. The same is true for descriptive metadata standards. The ones used for a specific class of materials need to both match well with the materials themselves and are supported in the institutional environment.

On the other side, how to deal with a bolt may not be immediately obvious, but it's not all that hard to figure it out. There are many options for what size socket wrench would be needed to tighten or loosen it. It would be nice if the bolt clearly stated what size it was, and this happens in some cases but certainly not in the majority. The wrench needed might be of the type measured in inches or the type measured in millimeters. We can consider these akin to the different approaches to description taken by libraries and archives. A practiced eye can examine the bolt and guess at the right size wrench to try. They'll likely get close, maybe with one or two mis-steps. With trial and error, a novice can find the right wrench as well. I believe the same is true for a system using metadata from an outside source - a skilled human can tell a lot about how best to use it from a quick glance, and trying various tactics out on it will inform how well various choices work for both an expert and a novice. With time and expertise we can transfer some of these evaluations to a system rather than a human (we can do some of this now; for example determining which metadata format from a predefined list is in use can be easily automated), but the process is still the same.

The point as I see it is that for these types of tools there are many options, but each of them are well-defined and well-documented. The same is true for metadata. As we gain more experience, we work towards defining various approaches that work for a given class of materials in a given environment. We then don't need to re-evaluate everything every time we start something new; we can refer to existing knowledge to see that in this case that approach has proven to be useful.

Maybe this analogy fails once you look closer (especially on the tool side - what's here pretty much represents my entire understanding of those types of things), but, like any analogy, it's bound to break down at some point of examination. But I've been pondering a bit and for now I still think it's useful. Feel free to argue with me, though!

Sunday, January 20, 2008

Musings on RDA, LC Working Group Report, and various other random things

I’ve been staying on the outskirts of the flurry of activity the last few months surrounding RDA and the Library of Congress Working Group on the Future of Bibliographic Control. Various things have been swimming around in my brain during this time, and I think now they’re finally ready to come out. I don’t know that I have any conclusions or suggestions for future directions, but that’s what I see as the function of this blog – to think through things that may or may not end up anywhere interesting.

I read the LC Working Group on the Future of Bibliographic Control’s draft report, and submitted comments via the web form designed for quick email questions rather than substantive feedback in time for the Working Group’s consideration. I didn’t post my comments on the blog, as many others did, as I felt my comments were pretty boring – they mostly were of the type “this paragraph seems to say x, but I wonder if you really meant to say y….” Overall, I was impressed with the report, and thought it represented an admirable vision for the directions in which libraries should be heading. It also struck me as largely avoiding library politics, although I thought it was odd a specific reference to FAST disappeared between the draft and final reports – I wonder what that was about? I liked the boldness of the report pushing attention for special collections, and the tough questions about the continued utility of MARC and LCSH.

But I, like many others, found a bit of schizophrenia in some of the specific recommendations. The report is not afraid to take a bold stand on MARC, but stops well short of recommending a move away from tens of thousands of distributed copies of bibliographic records, and (new in the final version, I think) questions RDA’s move away from ISBD. The report recommends moving quickly to work on new bibliographic frameworks but even more forcefully says that RDA should wait before proceeding. It provides many recommendations discussing how to improve moving information in and out of the catalog but provides little in the way of rethinking the function of the catalog itself. I believe some of this inconsistency is the result of trying to address comments the WG received (although I don’t see any changes related to any of my comments in there!), but most of it is probably due to the fact that this is a committee effort, written and revised on a short schedule. The biggest disappointment for me in the final report was that my favorite recommendation from the draft lost all of its power. In the draft report, one of the recommendations relating to LIS curricula described some extremely technical and theoretical topics as essential to offer. I believe cultivating individuals with both system and information expertise is the single most effective things we can do to ensure libraries play a part in the future information environment. In the final report, this recommendation was sanitized to simply say LIS curricula should include “advanced knowledge and topics.” Bleah. That could mean anything.

One other thing regarding the LC WG report: representatives from both Google and Microsoft served on the Working Group, but I see little if any evidence in the report that these individuals contributed points of view that haven’t been making the rounds within the library community already. That’s unfortunate. We need some outside points of view in this community.

I know the term “bibliographic control” has been questioned in relationship to this report. Roy Tennant suggests “descriptive enrichment instead. I recognize the problems with bibliographic control – it sounds so authoritarian in the face of the open vision the report outlines. But all labels are words, and words have baggage. I’m not clever with names (my dog is named Daisy, if that gives you a sense of how un-creative I am in this area), but I’m skeptical that any brief name could capture what we’re trying to do here. “Descriptive enrichment” to me calls up images of armies of humans manually adding things to records, an image I think we don’t want to be promoting. So I’ll remain neutral on the name issue – if someone comes up with a new one that folks like, I’d be happy to start using it. But I’m unlikely to be the one thinking that new label up.

I read many of the responses to the LC WG report that appeared on blogs, and found myself agreeing with many of the points made, and disagreeing with others. Pretty standard reaction, I suspect. I found OCLC’s response quite odd, however. It had the general tenor of “we’re doing all that stuff already, don’t worry, just trust us…” while at the same time oversimplifying the issues in a way I found totally inappropriate for a response to a committee of experts. For example, the OCLC response touts its FRBR work as testing the WG didn’t realize was happening, but it glosses over the fact that the Work-level clustering and other FRBR-like things OCLC has been doing aren’t true FRBR implementations. This community needs clarity and hard truths on these issues right now, not something that’s been reviewed by marketing. OCLC Research and RLG Programs are now and have been doing extremely interesting things recently, but few if any of them make their way to the productized mainstream of OCLC in ways that promote the state of the art or even fit well with the vision outlined in the LC WG report. I hope OCLC takes the report’s recommendations to heart in the same way LC and the rest of us are trying to do.

I found RDA’s reaction (or lack thereof) to the LC WG report to be of note as well. The folks behind RDA (the “Committee of Principals” for those of you in the know for such things) have on the RDA web site a response dated the same day the WG closed its call for comments. Presumably they submitted this document as an official comment in the appropriate time frame. The response, as the preface to the final LC WG report notes, smacks of “we’re too far along to stop now,” which in my mind is equivalent to “we/he/she have worked really hard, so what we came up with must be good,” which I believe is completely and totally bogus. It also lays the guilt trip on LC – saying basically “we’d hate to lose your input.” What the response doesn’t do is address directly (aside from listing a few pseudo-FRBR implementations that one can’t imagine the LC WG didn’t know about) any of the concerns raised in the report. It looks to me like more of people talking past each other and being defensive rather than trying to find common ground. Of course, the RDA response says they won’t be stopping development, which is no surprise at all. (Really, did anyone think they would? Wishful thinking doesn’t count.)

Through all this, I remain agnostic about RDA. I figure at some point I’m going to have to form an opinion, but frankly, I haven’t had the time to invest to develop an informed one. I haven’t read the last set of drafts (released December-ish), and with previous drafts I had trouble devoting the mental energy to them to see the forest of general vision and effectiveness for the trees of specific rules. I like the idea of more explicit connections to FRBR being behind the new organization, but it looks awfully complex. FRBR of course is complex, but I can’t help wondering if there’s another way to make the connection. I also understand (but again, haven’t seen myself) that the new drafts and/or supporting documents use terminology from the DC Abstract Model, including “literal value surrogate” and the like. I’m as intimidated by the terminology as the next person, but I do think it’s worth it to introduce some intellectual stringency to this process. I’m just not sure how to do that and still make the documents accessible.

Martha Yee has put online a set of cataloging rules she’s developed as a response to what seems to be the insanity surrounding us. I’ve long thought Martha was a clear voice in pushing against the book-centric focus of the cataloging community and realizing the importance of the display of information to users (in addition to just how we store it), but I’ve found myself disagreeing strongly with some of her more recent work that seems not to understand the state of the art with regards to search engines, information retrieval, or artificial intelligence. I haven’t read her cataloging rules yet, but I’m encouraged that she’s come up with some sort of concrete alternative (rather than just complaining, like the rest of us do), and apparently seems to be working towards an RDF model for her cataloging rules – bravo! I think any new set of rules, to be successful, however, need to be written to take advantage of current machine processing technologies. Not having read either the latest RDA drafts or Yee’s rules, I can’t say whether they do this or not. One can only hope.

And hope is where I am for the future of libraries. We have a lot going on right now in libraries, and I consider that a good thing. To use an old adage, we can’t be so afraid we’ll make a mistake that it prevents us from doing anything at all. Because we will make mistakes. We’re human. No matter how many people we involve, no matter how many levels of review we have, there will be things we try that don’t work out. If we realize that ahead of time we’ll be able to recover and try new things that will work. We’ve done so much already, and we have in our community an enormous number of insightful, dedicated individuals with a vision for where we’re going. Now we just have to find a way to let that vision emerge from the bureaucracy and the power of inertia.