Wednesday, May 25, 2005

Z39.19 Revision

Well, today is the deadline for comments to NISO on the new Z39.19 revision, and unfortunately I haven't had a chance to dig in far enough to make any comments useful to them. Blast. I did, however, open the document up today to look for something specific. In the course of that search, I came across this text: Parts of Multiple Wholes
When a whole-part relationship is not exclusive to a pair
wholes, the name of the whole and its part(s) should not
they should be linked associatively rather than hierarchically
Carburetors, for example, are parts of machines other
relationship in this instance is cars RT carburetors.

I'm disappointed in this decision. In order to preserve a pure hierarchy (something cannot be a part of multiple wholes), some semantics are lost. The idea that a carburetor is a part of a car (as well as potentially a part of lots of other stuff) is lost by relegating it to an RT (associative relationship). Whole-part relationships appear in the document as one of three types of hierarchical relationships; therefore, it seems that by categorizing them here the authors were forced to make the decision to move a huge number of things commonly thought of as having a whole-part relationship to an associative relationship. We librarians just love hierarchy, don't we. Too bad the world is polyhierarchical. Looks like our information systems won't be able to catch up yet.

Monday, May 23, 2005

"In Search of the Single Search Box"

Whew! It sure has been a while since I've posted. When starting this blog, finding the time to post on it was one of my major concerns. I'd been doing pretty well, but I recently hit a stretch where I was traveling more than home for about 6 weeks, and I moved in there as well! But I'm back now, and should be closer to home for a large portion of the summer. Here's to keeping up the blog while sitting on my new patio with a frosty beverage!

I heard an excellent presentation recently at the Digital Library Federation Spring Forum, which has been referenced recently on a library mailing list (WEB4LIB?). Staff at NC State have developed some methods for a single search box on the library's web site actually providing relevant information for all the many types of queries users type in that box no matter how much explanatory information indicating what resources that box searches is present on the page. The presentation was titled "In Search of the Single Search Box: Building a 'First Step' Library Search Tool." (Firefox users beware: the presentation is in HTML-ized Powerpoint and will look really strange in your browser!) Their video demo does an excellent job of illustrating the types of information needs to which the tool can respond. As the presentation suggests, this box doesn't search inside absolutely everything, but is intended to be a first step from which users can see some ideas and choose among them for continuing their journey.

As I recall (this is what I get for waiting this long to post on the topic...), the tool presents results in four major categories:

1) FAQ for the libraries
2) Library web pages
3) Links to perform the same search in some databases (the catalog, Academic Search Premier, list of journal titles, etc.)
4) Related subject categories

The FAQs meet needs where somebody wants to know the library hours or where the closest computer lab is. The library web pages results are Google-driven, so a page excerpt appears that a user might find helpful in selecting a result when they want some contextual information about a resource. The "search the collection" links make catalog or database search results an extra click away if that was the desired search, but that click is simply moved from the beginning of the process (click a catalog link on the home page, or, alternatively, take a few minutes to figure out which box on the front page to type in!) to this stage.

The "Browse Subjects" area, where a list of potentially relevant subjects is displayed, peaks my interest most about this project. The presentation didn't have a ton of information about where these links go and how the logic to develop them is created, and unfortunately I didn't have a chance to ask the NC State folks in person more about it. But from the presentation and the demo video, it looks like these links go to pathfinder-style pages where "selected" resources (selected how and by who would presumably be a local implementation decision) are displayed or linked. The presentation slides state that journal article titles and course descriptions are currently used to provide the connections between search terms and the pre-defined subjects. That's a great place to start! One can imagine a host of other options, including subject authority files, those same library web pages indexed elsewhere, and periodic looks at search logs for this box. Oh, and I see now one of the final slides in the presentation talks about some other sources - I'd forgotten that! I find the huge amount of potential here very exciting.

This tool isn't currently deployed on the NC State Libraries Web site, but I hope to see it soon. I don't recall if they plan to release any of their source code, but it sure would be nice if this was possible. I'll be keeping an eye on developments in this area.

Oh, and by the way. Never. Moving. Again. :-)

Saturday, May 07, 2005

Cataloging sound recordings

There has been a fascinating discussion on the Association for Recorded Sound Collections (ARSC) email list over the last week on cataloging sound recordings (look for threads starting with "database template" and "cataloging," then continuing here). The ARSC community is wonderfully diverse, including audiophiles, librarians, archivists, and others just interested in learning about sound recordings. The thread started out with an announcement of a database template for recording information about sound recordings; someone solving an immediate problem and wanting to share their solution with others. It's expanded greatly to become somewhat of a religious discussion on the relative merits and problems of MARC/AACR2 cataloging.

I can't help but feel that, like a great many discussions of this sort, the participants are talking past each other. One point that has been mentioned but perhaps not strongly enough, is that the user experience problems with library cataloging is heavily a problem of the use the search system makes of the data and how it's presented to end-users. Ralph Papakhian, one of the premier music catalogers in the country, who I like and respect a great deal, has made the point in this thread that the data elements some respondents mention as wanting to record are in fact recordable in MARC. And if anyone would know and can explain this to others, it's Ralph. But these elements, even though they're there, are often not accessible to users. For example, MARC has fields for date of composition and coded instrumentation of a recording or score. But few if any library systems index or display this data. So catalogers rarely enter them, which provides less incentive for systems to use them, which provides less incentive for catalogers to use them, which provides less incentive for systems to use them...

But I believe systems aren't the only problem. There are lots of little things I think MARC/AACR2 could do better. However, the biggest, and mostly implicit in this discussion, difference in what MARC does and what some of the other participants in this thread look for in sound recording cataloging, is the library focus on the carrier over the content. Catalogers discuss this issue frequently, but it hasn't been brought up explicitly in this thread. Audiophiles absolutely are interested in the recording as a whole--its matrix number, sound engineers, etc. But they are also equally interested in the musical works on the recording, what personnel are connected with which piece, timings of tracks, etc. MARC has places for these things, but they are relegated to second-class status. Catalogers know and tout the benefits of structure and authority control in information retrieval. But when it comes to the contents of a bibliographic item, we apply none of these principles in the MARC environment. Contents notes are largely unstructured (and what structure is possible is rarely used and keeps changing!), don't make use of name or title authority control, and in many cases aren't indexed in library systems.

As pointed out in this thread, creating this content-level information is extremely expensive. But the networked world has the potential to change that. Much of this information has been created in structured form outside of the library environment, by record companies, retailers, and enthusiasts, but we don't make use of it. Right now, it's difficult to make use of it because our systems don't know how to talk to each other. It will take a great many baby steps, but I hope we can start down the road towards changing that.

Matt Snyder of NYPL, who I met at MLA this year and was extrememly impressed with, has made the point in this thread that MARC records (and, by extension, library catalogs) and discographies have different purposes. This is definitely true in today's environment. Library catalogs are primarily for locating things, and discographies have more of a research bent. But I feel strongly, and this email discussion seems to support this view, that the distinction is largely artificial and is becoming less relevant as information retrieval systems continue to evolve. More sharing of data between systems will hopefully result in fewer systems to consult by end-users. That's certainly my goal!

Thursday, May 05, 2005

Known-item vs. unknown-item searching

A series of project assignments and offhand conversations recently have me thinking about how well (or how poorly) our current library-ish systems support users diving in and simply exploring what the system has to offer. On the whole, most of our discovery systems focus on known-item searching, where a user comes to the system with something specific in mind that they want to find: books by a certain author, a movie with a specific title, recordings by a particular artist. These information needs are of course common, and they are in fact the focus of Cutter's first objective of the catalog.

But look more closely at c) in that first objective - we should provide access to an item when the subject of it is known. So what exactly does that mean? Most current systems in a library environment fulfil that by making text in a subject-ish field keyword searchable. When I do a subject search in a system of that sort, I get back records that have subjects containing the word I typed in. But how do users know what the words in those subjects are? Some (certainly not all!) systems provide the user a way to look at a list of subjects used in that system. The user then is expected to locate all subjects of interest in that list, then construct a properly-formulated Boolean query OR-ing those subjects together. I'll be perfectly frank and state that I believe strongly that this is silly to expect of any user in this day and age, even an "expert" user such as a reference librarian. Let's use the computing power we have!

And what about these of Cutter's objectives?

2. To show what the library has
e. On a given and related subjects
f. In a given kind of literature
Mechanisms to achieve these goals, in support of unknown-item searching, fall far short of the sophistication we provide for known-item searching. We don't provide our users with ways to look around, to explore, to just see what we've got. If I read a book that inspires me to read some more on the topic, I go to my public library's catalog, find the book I liked, and click on a subject heading (from a maximum of three!) that seems like it might be promising. And what I find a huge majority of the time is a browse screen of LCSH headings, each with three or fewer hits. The topic of interest to me tends to be the first part, but the browse index is a seemingly endless list of geographic subdivisions of the topic, interspersed with other subdivisions such as "juvenile," and, in particularly poor systems, interspersed with other headings starting with the same word as the term before the first subdivision.

What we need are systems that do an exponentially better job of starting out from an interesting thing and finding more things like it. I personally think postcoordinated subject headings would be a major advance in this area, but they're certainly not enough. Systems that map lead-in terms to authorized terms, and expand search results to include narrower terms than a matched broader term are also necessary. One can also imagine other mechanisms to build that "like" relationship, based on information retrieval research, folksonomies, and transaction logs.

I suppose my point in the end is that it's simple to build a system that searches the text of pre-created metadata fields for an entered query string. It's much more difficult to build systems that allow users to truly explore. We often forget how important that exploration function is. We look at our search logs, and see mostly known-item searches, so we think that's what we need to focus on. Of course we see that - it's what our systems are designed around! But what would happen if we started to provide relevant results to subject and other unknown-item searches? I'd bet a whole lot of money that we'd see a huge increase in unknown-item searching. Sure, for some types of materials, known-item searching may very well be the primary means of access users need. But let's at least look at the alternative, and work with actual users to see how we can provide them with exploratory functions we don't currently supply.

Tuesday, May 03, 2005

FRBR Workshop

Wow! Wow, wow, wow, and WOW. I'm at the end of day 2 of a 2.5 day FRBR Workshop at OCLC, and I've been continuously blown away by the activity going on here. The workshop is supposed to be in large part a working session to start thinking about what revisions to the original FRBR report would look like. I was skeptical of that goal coming in, seeing as 75 people are here, but I've been extremely pleasantly surprised. If I've even been in a room with as many bright, engaged, and interesting people before, I didn't appreciate it at the time. Within the discussion, I find just the right balance of theory and practice, of idealism and realism. There's a very clear vision of what a bibliographic future could be, and a great many ideas for ways we can reach there in manageable steps.

The workshop itself is a mixture of presentations on specific topics and time to just talk. Some presentations don't at first glance look to be FRBR related, but every single one really does have a definite impact on how FRBR should develop in the future, either as a conceptual model or as some sort of implementation model based on the conceptual one. Some presentation slides are on the workshop site now, and hopefully all will eventually be. But the presentation slides in no way do the actual presentations and the resulting large- and small-group discussions justice. I feel more confident than at most meetings of this type that the discussions will have real results, in the form of writings and implementations. I sincerely hope so - many people out there are interested in this topic, and the best thing we can do now is share, share, and share some more.