Tuesday, October 23, 2007

Google Book Search and... LCSH?

The Inside Google Book Search Blog recently announced that they've "added subject links in a left navigation bar as additional entry points into the index." This, predictably, piqued my interest. I followed the links in the blog entry, poked around a bit, then looked at this book. (Hope that link is persistent... the book is "Asian American Playwrights: A Bio-Bibliographical Critical Sourcebook By Miles Xian Liu") [UPDATE: Either that link evaporated or I screwed up and pointed to the wrong place. Try this one.] See that "Subjects" heading over on the right. Expand it. At least some of those are LCSH! ("Asian Americans in literature" is a dead giveaway.) The first three are close to what one sees in the book and e-book records in Open WorldCat. I've been something very close to living under a rock recently, so maybe this isn't news, but it's news to me at least.

I don't quite know what to think of this. I've heard Google was getting MARC records for books they're digitizing from libraries, but this doesn't appear to be one of those books. Is this a sign they're incorporating library cataloging from other places as well? And to date we haven't seen them do much with that data. Is this the sign of a change? I don't know that we can interpret it that way. This is perpetual beta, remember, and it's Google with roughly a zillion servers, and the ability to try all sorts of things out simultaneously. Just because I see those headings now doesn't mean they'll be there tomorrow, or today for you who is hitting the service through a different route.

To some extent I think this is a good thing. We have a great deal of data in our catalogs that deserves to be put to better use than it currently is. It's great to see this data making its way into services such as GBS, and for GBS to realize "subjects" are useful, perhaps even essential, access points. (I'll skip in this post a rant about the many things "subject" can mean, including "genre" [pet peeve warning!], and my thoughts on when this data needs to be human-generated and when it doesn't.)

But I'm surprised to see the precoordinated headings there. One of them seems to have the free-floating --Biography and --Dictionaries removed, but Dictionaries stays in two of the headings. It's also interesting, although I don't know what it means, that the delimiter between parts of the heading in GBS is / rather than --. I'm wondering if there's any intelligent processing at work here or if this is a quick and dirty approach to providing subject access. These headings have a subfield structure that would make it trivial to just leave in the topical aspects (according to some definition of topical that doesn't match mine, especially for music) and remove the rest. Why wasn't this done? Does GBS perceive value in the precoordinated headings? Or have they just not spent time focusing on this yet?

It's my great hope that the way in which GBS ends up using library-originated subject headings sparks a great rethinking of how we provide subject access in the library community. We're very vested in the way we do things, and there's a great deal of value behind those ways. But just because there's some value doesn't mean that we can rest on our laurels. We simply must be continually evaluating how well our vocabularies perform in ever-evolving systems and user expectations. How closely services like GBS stick to those vocabularies will be a litmus test for us. Ever the optimist, I hope we can use what they do as data to help us shape our evolution, rather than dismissing it as uninformed or not applicable to us. Only time will tell.


spinelabel said...

There's a harsh review of Google Books subject headings and suggested titles at the following: http://searchengineland.com/071006-092325.php .

I've played around with GBS a little bit lately. I agree that adding subject headings marks a big departure from the keyword-only style of search. It would acknowledge that the contents of a book cohere more than the contents of a web site. (Both have "pages", though.)

If Google starts to include subject headings, I expect it to behave more like an OPAC. Searching by subject headings is a good start. The lists that such searches generate show the spottiness of their collection so far. I get better and more current titles for now by searching a library's OPAC.

Laura Akerman said...

I'm interested in where Google got the headings. They do appear to be LC subject headings from the terms and structure. But three are not found in the LC (and OCLC) record for the book (including the two that could be "genre headings"). I'm doubting that Google has hired catalogers... but also doubting that machine processing produced these (but who knows?). Does Google have a financial arrangement with a cataloging source, or have they gleaned these headings from freely available catalog records somewhere?

Hannah said...

I know this is a little after-the-fact, but I know that Google does hire librarians, although the job title doesn't always have 'librarian' in the title. I attended an AzLA (Arizona Library Association) conference in October 2006 and one of the lectures was about Google, given by a representative (with an MLS) who told us about the Google Librarian Central blog where I've seen at least two open librarian positions mentioned (although it is not updated very often). So - I don't know if that is contributing to the use of LCSH (or something closely resembling it), but they are hiring people with those skills.