Wednesday, June 28, 2006

A quest for better metadata

I wasn’t able to attend the ACM/IEEE Joint Conference on Digital Libraries this year, but the buzz surrounding the paper by Carl Lagoze (et al) about the challenges faced by a large aggregator despite using supposed low-barrier methods such as OAI led me to look the written version of this paper up. This paper demonstrates very well that now matter how “low-barrier” the technology (OAI) or the metadata schema (DC), bad metadata makes life difficult for aggregators. Garbage in, garbage out had been a truism for some time, and the “magic” behind current technology can help, but can only go so far to mediate poor input.

There has been spirited discussion in the library world recently about next generation catalogs, but that discussion has heavily centered on systems rather than the data that drives them. I’d argue that one needs both highly functional systems and good data in order to provide the sorts of access our users demand. How we get that good data is what I’ve been interested in recently. Humans generating it the way libraries currently do is one part of a larger-scale solution, but given the current ratio of interesting resources to funding for humans to describe them, we must find other means to supplement our current approach.

So what might we do? Here are my thoughts:

  • Tap into our users. There are a whole lot of people out there that know and care a lot more about our resources than Random J. Cataloger. Let’s harness the knowledge and passion of those users, and provide systems that let them quickly and easily share what they know with us and other users.

  • Get more out of existing library data. As Lorcan Dempsey says, we should “make our data work harder.” Although MARC and other library descriptive traditions have many limitations in light of next-generation systems, they still represent a substantial corpus of data that we must use as a basis for future enhancements. Let’s use any and all techniques at our disposal to transform this data into that which drives these next-generation systems.

  • Look outside of libraries. Libraries do things differently than publishers, vendors, enthusiasts, and many other communities that create and use metadata. We should keep in mind the cliché, “Different is not necessarily better.” We need to both look at ways of mining existing metadata from other communities to meet our needs, and re-examine the way we structure our metadata with specific user functions in mind.

  • Put more IR techniques into production. Information retrieval research provides a wide variety of techniques to better process metadata from libraries and other communities. Simple field-to-field mapping is only a portion of what we can make this existing data do for us. We must work with IR experts to push our existing data farther. IR techniques can also be made to work not just on metadata but the data itself. Document summarization, automatic metadata generation, and content-based searching of text, still images, audio, and video can all provide additional data points for our systems to operate upon.

  • Develop better cooperative models. Libraries have a history of cooperative cataloging, yet this process is anything but streamlined. We simply must get away from models where every library hosts local copies of records, and each of those records receives individual attention, changing, enhancing, even removing (!) data for local purposes. Any edits or enhancements performed by one should benefit all, and the current networked environment can support this approach much better than was possible when cooperative cataloging systems were first developed.

My point is, we can’t plug our ears, sing a song, and keep doing things the way we have been doing. Let’s make use of the developments around us, contribute the expertise we have, and all benefit as a result.

Saturday, June 24, 2006

Finding new perspectives

I spent last week at a conference with an extremely diverse group of attendees. Almost all were trained musicians; among these were traditional humanist scholars, librarians of all sorts, and a smattering of technologists. I spoke at two sessions, each on a topic related to how library systems might better meet the needs of our users. I was pleasantly surprised by the environment in these sessions, and in the conference as a whole.

Due to the diversity of attendees, I had feared that my ideas might be either rejected wholesale in light of very real and valid practical concerns, or ignored due to a perception that they were irrelevant to the work of many attendees. I was wrong. I had many stimulating and mutual idea-generating discussions with other attendees, most of whom don't spend their time thinking about system design like I'm lucky enough to do. My perspective of thinking big and not being satisfied by what current systems deliver us was greeted with a great deal of enthusiasm, showing me in no uncertain terms just how connected and devoted many librarians (and those in related fields) are to the needs of our users. Perhaps those who disagreed with my approach were just being polite in not expressing major differences in perspective publicly or privately (it was an international conference and I admit to not fully understanding all the cultural factors at work); I hope not, or at least I'd like to think that such disagreements could take the form of collegial conversation that starts in a session then continues afterward to the mutual benefit of both parties. But, then again, I can be an optimist about such things.

Perhaps the most surprising thing was that my point of view wasn't the most progressive there. I had a number of conversations with attendees whose vision was broader, more visionary, more of a departure from the current environment than mine. I view myself as striking a reasonable compromise between vision and practicality in the digital library realm, but my preconception of this conference was that I would be very far outside the attendees' respective norms. I was certainly on that side, and it was good to see I had company, and even a few compatriots that were further out to stimulate discussion.

What I took away was that we in the digital library world have a tendency to navel-gaze, to think we're the only ones that can plan our next-generation systems. This week I found an excellent cross-section of groups we need to more fully engage in this discussion. Without them and others like them, we're missing vital ideas.