Wednesday, June 28, 2006

A quest for better metadata

I wasn’t able to attend the ACM/IEEE Joint Conference on Digital Libraries this year, but the buzz surrounding the paper by Carl Lagoze (et al) about the challenges faced by a large aggregator despite using supposed low-barrier methods such as OAI led me to look the written version of this paper up. This paper demonstrates very well that now matter how “low-barrier” the technology (OAI) or the metadata schema (DC), bad metadata makes life difficult for aggregators. Garbage in, garbage out had been a truism for some time, and the “magic” behind current technology can help, but can only go so far to mediate poor input.

There has been spirited discussion in the library world recently about next generation catalogs, but that discussion has heavily centered on systems rather than the data that drives them. I’d argue that one needs both highly functional systems and good data in order to provide the sorts of access our users demand. How we get that good data is what I’ve been interested in recently. Humans generating it the way libraries currently do is one part of a larger-scale solution, but given the current ratio of interesting resources to funding for humans to describe them, we must find other means to supplement our current approach.

So what might we do? Here are my thoughts:

  • Tap into our users. There are a whole lot of people out there that know and care a lot more about our resources than Random J. Cataloger. Let’s harness the knowledge and passion of those users, and provide systems that let them quickly and easily share what they know with us and other users.

  • Get more out of existing library data. As Lorcan Dempsey says, we should “make our data work harder.” Although MARC and other library descriptive traditions have many limitations in light of next-generation systems, they still represent a substantial corpus of data that we must use as a basis for future enhancements. Let’s use any and all techniques at our disposal to transform this data into that which drives these next-generation systems.

  • Look outside of libraries. Libraries do things differently than publishers, vendors, enthusiasts, and many other communities that create and use metadata. We should keep in mind the cliché, “Different is not necessarily better.” We need to both look at ways of mining existing metadata from other communities to meet our needs, and re-examine the way we structure our metadata with specific user functions in mind.

  • Put more IR techniques into production. Information retrieval research provides a wide variety of techniques to better process metadata from libraries and other communities. Simple field-to-field mapping is only a portion of what we can make this existing data do for us. We must work with IR experts to push our existing data farther. IR techniques can also be made to work not just on metadata but the data itself. Document summarization, automatic metadata generation, and content-based searching of text, still images, audio, and video can all provide additional data points for our systems to operate upon.

  • Develop better cooperative models. Libraries have a history of cooperative cataloging, yet this process is anything but streamlined. We simply must get away from models where every library hosts local copies of records, and each of those records receives individual attention, changing, enhancing, even removing (!) data for local purposes. Any edits or enhancements performed by one should benefit all, and the current networked environment can support this approach much better than was possible when cooperative cataloging systems were first developed.

My point is, we can’t plug our ears, sing a song, and keep doing things the way we have been doing. Let’s make use of the developments around us, contribute the expertise we have, and all benefit as a result.


Anonymous said...

"Tap into our users."

This is I think one of the most important ideas we need to figure out how to incorporate in our metadata environment.

It won't be easy. In the current OCLC-centered library metadata environment, it's a tiny minority of even professional catalogers that have the authority to make changes to cataloging records--or provide their corrections in some other way. When we don't even tap into most of those in our own professional community...


Jenn Riley said...

I totally agree that a big change is necessary here. I recently read an article by Jeffrey Beall decrying the quality of copy cataloging by how many institutions didn't correct errors from the master record in their local copies. I nearly threw the paper to the floor screaming, "But if you just let ONE of them fix the master record, they don't all have to find and fix the same stupid typo!!!!" Then I realized I was wanting to yell about cataloging and I returned to putting these things in perspecive. :-)