Saturday, September 30, 2006

Librarians in the Media

CNET news published an article this week entitled, “Most reliable search tool could be your librarian.” While it’s nice to see librarians getting some press, I remain concerned about our image, both as presented in the media and as we present ourselves.

The article contains the usual rhetoric about caution in evaluating the “authority” of information retrieved by Web search engines, the need for advanced search strategies to achieve better search results, and the bashing of keyword searching. Here, as in so many other places, the subtext is that “our” (meaning libraries’) information is “better” – that if only you, the lowly ignorant user, would simply deem to listen to us, we can enlighten you, teach you the rituals of “quality” searching and location of deserving resources rather than that drivel out there on the Web, that could be written by (gasp!) any yahoo out there.

Of course we know it’s not that simple. But the oversimplification is what’s out there. We’re not doing ourselves any favors by portraying ourselves (or allowing ourselves to be portrayed) as holier-than-thou, constantly telling people they’re not looking for things the right way or using the right things from what they do find, even though they thought they were getting along just fine. We simply can’t draw a line in the sand and say, “the things you find through libraries are good and the things you don’t are suspect.” There are really terrible articles in academic journals, and equally terrible books, many published by reputable firms. There are, on the other hand, countless very good resources out there on the Web, discoverable through search engines. And the line between the two is becoming ever more blurry as scholarly publishing moves towards open access, libraries are putting their collections online, government resources are increasingly becoming Web-accessible, and search engines gain further access to the deep Web.

The first strategy I feel we should be taking is to move discussion away from focusing on the resource and its authority to the information need. Evaluating an individual resource is of course important, but it’s not the first step. Let’s instead talk first about all the resources and search strategies that can meet a given need, rather than always focusing on resources and search strategies that can’t meet that need. There are many, many ways a user can successfully locate the name of the actor in the movie he saw last night, identify a source to purchase a household item at a reasonable price, find a good novel to read on a given theme, or learn more about how the War of 1812 started. Let’s not assume every information need is best met by a peer-reviewed resource, and make those peer-reviewed resources and the mediation services for them we can offer more accessible when these resources and our services are appropriate to meet those information needs. Let’s be a part of the information landscape for our patrons, rather than telling them we sit above it.

Saturday, September 02, 2006

On "authority"

I recently got around to reading the response from Encyclopedia Britannica to the comparison of the “accuracy” of articles in Britannica and Wikipedia by Nature. It’s got me thinking about the nature of authority, accuracy, and truth.

Britannica’s objections to the Nature article arise from a different interpretation of the words “accuracy” and “error.” The refutations by Britannica fall into two general categories. The first is the disputation of certain factual statements, mostly when such facts were established by research. Here, these facts aren’t truly objective, rather, they’re a product of what a human is willing to believe based on the evidence. Different humans will draw different conclusions based on the same evidence. And then there’s the other human element: mistakes. We make them, both those of us who work for Britannica and those who work for Nature. The “error” rates Nature reported for both sources are astonishingly high. Certainly not all of these are true mistakes, maybe not even very many of them, but they exist, in every resource humans create, despite any level of editorial oversight.

Second, and more prevalent, are differing opinions among reasonable people, even experts in a given domain, about what is appropriate at what isn’t to include in text written for a given audience. Anything but the most detailed, comprehensive coverage of a subject requires some degree of oversimplification (and maybe even those as well). By some definition, all such oversimplifications are “wrong” – it’s a matter of perspective and interpretation whether or not they’re useful to make in any given set of circumstances. Truth is circumstantial, much as we hate to admit it.

I’d say the same principles apply to library catalog records. First, think about factual statements. At first glance, something like a publication date would seem to be an objective bit of data that’s either wrong or right. But it’s not that simple. There are multitudes of rules in library cataloging governing how to determine a publication date and how to format it. Interpretation of those rules is necessary, therefore often two different reasonable decisions based on them as to what the publication date is are possible. In cases where a true mistake has been made, our copy cataloging workflows require huge amounts of effort to distribute corrections among all libraries that have used the record with that mistake. Only sometimes is a library correcting a mistake able to reflect this correction in a shared version of a record, and no reasonable system exists to populate that correction to libraries that have already made their own copy of that record. The very idea of hundreds of copies of these records, each slightly different, floating around out there is ridiculous in today’s information environment. We’re currently stuck in this mode for historical reasons, and a major cooperative cataloging infrastructure upgrade is in order.

More subjective decisions are not frequently recognized as such when librarians talk about cataloging. We talk as if one would only follow the rules, the perfect catalog record would be produced, and that if two people were to just follow the same rules, they would produce identical records. But of course that’s not true. There will always be individual variation, no matter how well-written, well-organized, or complete the instructions. Librarians complain about “poor” records when subject headings don’t match their ideas of what a work is about. But catalogers don’t (and of course can’t) read every book, watch every video, or listen to every musical composition they describe. Why have we set up a system whereby we spend a great deal of duplicate effort overriding one subjective decision with another, based on only the most cursory understanding of the resources we’re describing, and keeping multiple but different copies of these records in hundreds of locations? How, exactly, does this promote “quality” in cataloging?

An underlying assumption here is that there is one single perfect cataloging record that is the best description of an item. But of course this isn’t true either. All metadata is an interpretation. The choices we make about vocabularies, level of description, and areas of focus all preference certain uses over others. I’m fond of citing Carl Lagoze’s statement that "it is helpful to think of metadata as multiple views that can be projected from a single information object." Few would argue with this statement taken alone, yet our descriptive practices don’t reflect it. It’s high time we stopped pretending that the rules are all we need, changed our cooperative cataloging models to do it truly cooperatively, and use content experts rather than syntax experts to describe our valuable resources.