Wednesday, November 15, 2006
Children's Book Week
Reading all the touching stories of favorite childhood books across the biblioblogosphere in honor of Children's Book Week has guilted me into posting my own contribution. I still smile when I think of The Little Old Man Who Could Not Read, Irma Simonton Black (Author), Seymour Fleishman (Illustrator). It's a story of a man (who cannot read) who goes to the grocery store and selects items based on the box size and color, trying to match them to products he knows he has at home. Of course, he ends up with an amusing assortment of unintended purchases. The story is touching and the illustrations really make the point. Like many books from my childhood, I think it's out of print (and I see it was first published in 1968, before I was born), but it looks like Amazon can hook you up with a copy, as could many local libraries.
Tuesday, November 07, 2006
More structured metadata
I often encounter people who see my job title (Metadata Librarian) and assume I have an agenda to do away with human cataloging entirely and rely solely on full-text searching and uncontrolled metadata generated by authors and publishers. That’s simply not true; I have no such goal. I am interested in exploring new means of description, not for their own sake, but for the retrieval possibilities they suggest for our users. So here are a few statements that begin to explain my metadata philosophy:
I want more automation. Throwing more money at a manual cataloging process is not a reasonable solution. First of all, it would take waaaaaaayyyyy more money than we can even dream of getting, and second, much metadata creation is not a good use of human effort. Let’s automate everything we can, saving our skilled people for the tasks current automation means are furthest from performing adequately. Let’s get more objective types of metadata, such as pagination, from resources themselves or from their creators (including publishers). Let’s build systems that make data entry and authority control easy. Yes, there will be some mistakes. There will be mistakes if the whole thing is done by humans too. Are catching the few mistakes that will happen from these automated processes more important than devoting our human effort to that extra few resources? More automation means more data total, and the sorts of discovery services I have in mind need lots of that data.
I want more consistency. Users can’t find what’s not there. While we can’t prescribe all records for all resources everywhere have to have a large number of features (I’m against metadata police!), the more of those features that are there mean more discovery options for those users. Imagine a system that provides access to fiction based on geographic setting. Cool, huh? I read one book recently set in Cape Breton Island and can’t wait to get my hands on more. We can’t do that very well today because that data is in very few of our records, and when it is there, isn’t always in the same place. The more consistent we are with our metadata, the better able we’ll be to build those next-generation systems.
I want more structure. I’m a big fan of faceted browsing. The ability to move seamlessly through a system, adding and removing features such as language, date, geography, topic, instrumentation (hey, I’m a musician…), and the like based on what I’m currently seeing in a result set is something I believe our users will be demanding more and more. But we can’t do this if that information isn’t explicitly coded. Instrumentation (e.g., “means of performance”) as part of a generic “subject” string isn’t going to cut it. Geographic subdivisions (even in their own subfield) that are structured to be human- rather than machine-readable also aren’t going to cut it. Nor are textual language notes, [ca. 1846?], or most GMDs. Many of these things can be parsed, and turned into more highly structured data with some degree of success. But why aren’t we doing it that way in the first place? More structure = better discovery capabilities.
What this all means is I’m glad there are lots of extremely bright people with all sorts of perspectives and skills thinking about improved discovery for library materials, but that doesn’t necessarily mean throwing out metadata-based searching. The sorts of systems I envision require more, more highly structured, more predictable, and higher-quality metadata. I want more, not less.
I’ll stand on one last (smallish) soapbox before wrapping this up. In many communities (including both search engines and libraries), discussions about retrieval possibilities often center around textual resources. However, not everything that people are interested in is textual. That’s of course not a surprise, but I’m shocked at how often discovery models are presented that rely on this assumption. I’m all for using the contents of a textual resource to enhance discovery in interesting ways, but we need systems that can provide good retrieval for other sorts of materials too. Let’s not leave our music, our art, our data sets, our maps hanging out to dry while we plow forward with text alone.
I want more automation. Throwing more money at a manual cataloging process is not a reasonable solution. First of all, it would take waaaaaaayyyyy more money than we can even dream of getting, and second, much metadata creation is not a good use of human effort. Let’s automate everything we can, saving our skilled people for the tasks current automation means are furthest from performing adequately. Let’s get more objective types of metadata, such as pagination, from resources themselves or from their creators (including publishers). Let’s build systems that make data entry and authority control easy. Yes, there will be some mistakes. There will be mistakes if the whole thing is done by humans too. Are catching the few mistakes that will happen from these automated processes more important than devoting our human effort to that extra few resources? More automation means more data total, and the sorts of discovery services I have in mind need lots of that data.
I want more consistency. Users can’t find what’s not there. While we can’t prescribe all records for all resources everywhere have to have a large number of features (I’m against metadata police!), the more of those features that are there mean more discovery options for those users. Imagine a system that provides access to fiction based on geographic setting. Cool, huh? I read one book recently set in Cape Breton Island and can’t wait to get my hands on more. We can’t do that very well today because that data is in very few of our records, and when it is there, isn’t always in the same place. The more consistent we are with our metadata, the better able we’ll be to build those next-generation systems.
I want more structure. I’m a big fan of faceted browsing. The ability to move seamlessly through a system, adding and removing features such as language, date, geography, topic, instrumentation (hey, I’m a musician…), and the like based on what I’m currently seeing in a result set is something I believe our users will be demanding more and more. But we can’t do this if that information isn’t explicitly coded. Instrumentation (e.g., “means of performance”) as part of a generic “subject” string isn’t going to cut it. Geographic subdivisions (even in their own subfield) that are structured to be human- rather than machine-readable also aren’t going to cut it. Nor are textual language notes, [ca. 1846?], or most GMDs. Many of these things can be parsed, and turned into more highly structured data with some degree of success. But why aren’t we doing it that way in the first place? More structure = better discovery capabilities.
What this all means is I’m glad there are lots of extremely bright people with all sorts of perspectives and skills thinking about improved discovery for library materials, but that doesn’t necessarily mean throwing out metadata-based searching. The sorts of systems I envision require more, more highly structured, more predictable, and higher-quality metadata. I want more, not less.
I’ll stand on one last (smallish) soapbox before wrapping this up. In many communities (including both search engines and libraries), discussions about retrieval possibilities often center around textual resources. However, not everything that people are interested in is textual. That’s of course not a surprise, but I’m shocked at how often discovery models are presented that rely on this assumption. I’m all for using the contents of a textual resource to enhance discovery in interesting ways, but we need systems that can provide good retrieval for other sorts of materials too. Let’s not leave our music, our art, our data sets, our maps hanging out to dry while we plow forward with text alone.
Subscribe to:
Posts (Atom)