Tuesday, November 07, 2006

More structured metadata

I often encounter people who see my job title (Metadata Librarian) and assume I have an agenda to do away with human cataloging entirely and rely solely on full-text searching and uncontrolled metadata generated by authors and publishers. That’s simply not true; I have no such goal. I am interested in exploring new means of description, not for their own sake, but for the retrieval possibilities they suggest for our users. So here are a few statements that begin to explain my metadata philosophy:

I want more automation. Throwing more money at a manual cataloging process is not a reasonable solution. First of all, it would take waaaaaaayyyyy more money than we can even dream of getting, and second, much metadata creation is not a good use of human effort. Let’s automate everything we can, saving our skilled people for the tasks current automation means are furthest from performing adequately. Let’s get more objective types of metadata, such as pagination, from resources themselves or from their creators (including publishers). Let’s build systems that make data entry and authority control easy. Yes, there will be some mistakes. There will be mistakes if the whole thing is done by humans too. Are catching the few mistakes that will happen from these automated processes more important than devoting our human effort to that extra few resources? More automation means more data total, and the sorts of discovery services I have in mind need lots of that data.

I want more consistency. Users can’t find what’s not there. While we can’t prescribe all records for all resources everywhere have to have a large number of features (I’m against metadata police!), the more of those features that are there mean more discovery options for those users. Imagine a system that provides access to fiction based on geographic setting. Cool, huh? I read one book recently set in Cape Breton Island and can’t wait to get my hands on more. We can’t do that very well today because that data is in very few of our records, and when it is there, isn’t always in the same place. The more consistent we are with our metadata, the better able we’ll be to build those next-generation systems.

I want more structure. I’m a big fan of faceted browsing. The ability to move seamlessly through a system, adding and removing features such as language, date, geography, topic, instrumentation (hey, I’m a musician…), and the like based on what I’m currently seeing in a result set is something I believe our users will be demanding more and more. But we can’t do this if that information isn’t explicitly coded. Instrumentation (e.g., “means of performance”) as part of a generic “subject” string isn’t going to cut it. Geographic subdivisions (even in their own subfield) that are structured to be human- rather than machine-readable also aren’t going to cut it. Nor are textual language notes, [ca. 1846?], or most GMDs. Many of these things can be parsed, and turned into more highly structured data with some degree of success. But why aren’t we doing it that way in the first place? More structure = better discovery capabilities.

What this all means is I’m glad there are lots of extremely bright people with all sorts of perspectives and skills thinking about improved discovery for library materials, but that doesn’t necessarily mean throwing out metadata-based searching. The sorts of systems I envision require more, more highly structured, more predictable, and higher-quality metadata. I want more, not less.

I’ll stand on one last (smallish) soapbox before wrapping this up. In many communities (including both search engines and libraries), discussions about retrieval possibilities often center around textual resources. However, not everything that people are interested in is textual. That’s of course not a surprise, but I’m shocked at how often discovery models are presented that rely on this assumption. I’m all for using the contents of a textual resource to enhance discovery in interesting ways, but we need systems that can provide good retrieval for other sorts of materials too. Let’s not leave our music, our art, our data sets, our maps hanging out to dry while we plow forward with text alone.

4 comments:

Dorothea said...

Can I hear an AMEN?

Steve said...

"AMEN!"

(Didn't want to leave you hanging.)

Anonymous said...

Thank you. We need to "smarten up", not dumb down, cataloging/metadata records. Too many people seem to think that we arguing for updating cataloging practices to work better with contemporary technologies are arguing for a 'dumbing down'. Couldn't be more opposite.

Jonathan

Thom said...

Check out Free Library of Philadelphia, Fleischer Collection MARC records which have instrumentation requirements, based on the Daniels Orchestal Music model. (i.e. 1-2-2-2, etc.)

And for audio information...how about getting the timing by actually putting CDs into the computer and letting the application extract it, rather than absurdly ca.-ing eveything; that would be better for machines too. And I don't get me started on name-title analytics. Music and sound recordings continue to be round pegs in a square box world, at least for bibliographic descrption and access purposes. All-Music Guide is not perfect, but at least they got the model right for their information.