Sunday, November 18, 2007
A bright future for bibliographic control
For example, the recommendation that RDA development be abandoned until FRBR is better understood could be a very good thing or a very bad thing, in my opinion, depending on how the recommendation is written. Karen Coyle, consultant to the group, involved in helping to write the report, has already indicated that the report's recommendations will be presented differently than they were in the presentation, addressing RDA and FRBR separately. I fall in the camp that the currently-released RDA drafts fall short of the sea-change for which it strives, but I also think we need to be making this change sooner rather than later. We'll see, I suppose.
I was also thrilled to hear in Barbara Tillett's question from the audience that LC is close to making some of their vocabularies (she didn't say which ones, presumably things like the language codes - I suppose it's too much to hope LCSH would be included) available via SKOS. This is the first I've heard of that initiative, but I think it's fantastic. It's something my institution would be able to take advantage of fairly quickly.
But back to the big picture, which I think is the most significant part of this week's presentation. The working group has outlined a vision that goes far beyond the mechanics of bibliographic control, into the scope and functions of discovery and use applications. I think this is a significant development, one that has gone in exactly the right direction. It only makes sense to analyze how we do bibliographic control if we fully understand what it is that work is supporting. I don't recall any of the webcast presenters saying this in so many words, but the vision that was outlined was not just one for library catalogs as we define them today. It covered information systems in the larger sense, and how they could interact. The future of bibliographic control is not just the design of records we create and store; it's a fluid, connected, living information environment in which libraries are but one (albeit important) player. What an exciting time to be a librarian!
Tuesday, October 23, 2007
Google Book Search and... LCSH?
I don't quite know what to think of this. I've heard Google was getting MARC records for books they're digitizing from libraries, but this doesn't appear to be one of those books. Is this a sign they're incorporating library cataloging from other places as well? And to date we haven't seen them do much with that data. Is this the sign of a change? I don't know that we can interpret it that way. This is perpetual beta, remember, and it's Google with roughly a zillion servers, and the ability to try all sorts of things out simultaneously. Just because I see those headings now doesn't mean they'll be there tomorrow, or today for you who is hitting the service through a different route.
To some extent I think this is a good thing. We have a great deal of data in our catalogs that deserves to be put to better use than it currently is. It's great to see this data making its way into services such as GBS, and for GBS to realize "subjects" are useful, perhaps even essential, access points. (I'll skip in this post a rant about the many things "subject" can mean, including "genre" [pet peeve warning!], and my thoughts on when this data needs to be human-generated and when it doesn't.)
But I'm surprised to see the precoordinated headings there. One of them seems to have the free-floating --Biography and --Dictionaries removed, but Dictionaries stays in two of the headings. It's also interesting, although I don't know what it means, that the delimiter between parts of the heading in GBS is / rather than --. I'm wondering if there's any intelligent processing at work here or if this is a quick and dirty approach to providing subject access. These headings have a subfield structure that would make it trivial to just leave in the topical aspects (according to some definition of topical that doesn't match mine, especially for music) and remove the rest. Why wasn't this done? Does GBS perceive value in the precoordinated headings? Or have they just not spent time focusing on this yet?
It's my great hope that the way in which GBS ends up using library-originated subject headings sparks a great rethinking of how we provide subject access in the library community. We're very vested in the way we do things, and there's a great deal of value behind those ways. But just because there's some value doesn't mean that we can rest on our laurels. We simply must be continually evaluating how well our vocabularies perform in ever-evolving systems and user expectations. How closely services like GBS stick to those vocabularies will be a litmus test for us. Ever the optimist, I hope we can use what they do as data to help us shape our evolution, rather than dismissing it as uninformed or not applicable to us. Only time will tell.
Saturday, October 13, 2007
Catalog vs. search engine - or is it?
Discussion on the future of library catalogs is common today. In these conversations, I often hear an argument something like this: “Catalogs and search engines have different goals; are trying to accomplish different things. Therefore we shouldn’t be making direct comparisons between them. By extension, we shouldn’t be comparing their functionality and features either.” This is of course an oversimplification of what’s generally said, but the spirit is there.
I’m concerned about this line of thinking. The original posit makes sense on the surface, in the sense that there is a history of analyzing and documenting the goals of the catalog (Panizzi, etc.), and that the business goal of search engines is to make money by selling advertising. But I think this approach both sells search engines short and doesn’t go far enough thinking about catalogs. From the search engine point of view, the business argument is true, of course, but overly simplistic. We can extend the definition of the goal of search engines to say that they strive to make money by selling advertising in a system that connects people to information they seek. Google wasn’t a business at first, it started as a research project by CS students to better index information. That’s a pretty simple and laudable goal – to help people find things. The catalog is the same. With all the talk about the goal of the catalog being collocation (and all the other related goals well-documented in the literature), it’s easy to forget that those goals exist (wait for it…) to connect people, today and in the future, with information they seek. So in this very basic sense, catalogs and search engines are trying to accomplish the same thing. The methods are often different, but I don’t think we’re serving ourselves well if we just write the success of search engines and the current struggles of library catalogs off because of those differences.
Early search engines had one big difference from library catalogs: the materials they index. But this is no longer true to any significant degree. I’m no fan of cataloging web sites in MARC to make them searchable in our catalogs, and I see this as largely out of favor now, but this was only the first step towards blurring the line between the content indexed by search engines and that in our catalog. Google Book Search, for example, provides access to many of the same materials that are in our catalogs. The methods of searching are very different, with full-text indexing being a strong component of GBS and bibliographic information the strongest component of our catalogs, but again, the goal is the same – getting people to books relevant to their information need. The argument separating catalogs from search engines by format of materials indexed is waning, but I still hear it from time to time. The conventional argument that a catalog provides access to things a library owns is also waning, for obvious reasons.
So what’s left to distinguish the goals our catalogs from search engines, giving us a convenient excuse for why our catalogs perform so poorly? Not much of substance, I think. To me, the different is all in style instead. Let’s certainly keep those goals of the catalog in mind, but let’s not assume that the methods we’ve used to achieve those goals in the past are the only methods that can be effective. If the goals of search engines and catalogs aren’t all that different in the end, maybe we can mix and match some methods too. We’ll never know until we try.
Thursday, August 02, 2007
Can LIS education learn something from CS?
But seeing some discussion recently about whether or not some of the more technical positions in libraries should require an MLS, I'm wondering if there's something we can learn from the CS community. Technical jobs commonly require a BS in Computer Science (note: NOT a graduate degree, whether it be research-based or professional), or demonstrated expertise in the task at hand, say, programming. That expertise can be demonstrated through that degree, through various certification programs, or by showing code one has written. While I suspect some would argue that the MLS is equivalent to those certification programs, I'm not so sure. A certification program for, say, Windows server administration, would be based on many practical tasks, and we don't see many of those in our library schools.
While I do agree we should be teaching the theory of things and then apply the practice on top of it, I see our library schools failing our students by ONLY teaching the former and providing no opportunity for the latter. Even single undergraduate programming classes manage to teach both. Can't we learn something from that?
Monday, July 30, 2007
Cutting through the rhetoric about subject headings
It seems to me that a great many of our disagreements in the library realm have at their root people talking past each other, each side meaning something different by a given term or two, but not cognizant of that fact. I see LCSH as a prime example of this phenomenon. A great deal of debate occurs over whether precoordinated subject strings or postcoordinated subject strings are more useful. But I see a fundamental difference in the way various participants in these discussions define “postcoordinated.”
One definition is that postcoordinated headings have no subdivisions at all; in LCSH-speak, have no --. The other definition is that postcoordinated headings are “faceted” (to introduce another term that complicates the issue); that each heading reflects only one characteristic of the work, such as “topic,” “place,” or “date.” The difference here is the difference between “subdivisions” and “facets.” These two concepts are not identical. A common criticism of postcoordinated headings is that they would not represent the essential distinction between concepts like “History--Philosophy” and “Philosophy--History.” While (ignoring the syntax; whether or not the double dashes are used is a style issue) this would be true according to the first definition, it’s not necessarily true according to the second—a “topical” facet may very well represent a complex concept. I’ve never seen a discussion on this issue in which this distinction is made clear to both sides. It’s unclear to me whether the “traditional” definition of postcoordinate allows the faceted interpretation or requires the subdivision interpretation, but I think what’s needed here is clarification of current definitions rather than historical ones.
I don’t have all the answers in this debate, nor does anyone else at this point. My inclination is toward the postcoordinate side, although I do very much want to keep an open mind on the issue. I’d like to see a well-reasoned argument for a postcoordinate system presented according to the facet definition (something I’ve long been wanting to write but find this is one of the many issues that have trouble finding their way from my brain to a shareable form). I personally read arguments for precoordinate indexing and think to myself, “We can do all of this with postcoordinate headings if we had systems that operated reasonably.” (Big IF there, considering our current state of affairs!) We need to have more room to experiment with these options to see if my interpretation is a good one. The Endeca use of precoordinated strings shows powerful promise; we need more large-scale implementations of systems working off of postcoordinated data to allow us to compare both user functionality and cataloging time (a much-forgotten but essential factor) of the two approaches. I want data, darn it! We can only go so far with the philosophical argument; to get beyond our current roadblock we need to see what will happen if we follow the various paths available to us.
Sunday, July 08, 2007
Everything I know about librarianship I learned from Star Wars
Forgive the hyperbole—of course it’s not everything. But hear me out.
In celebration of meeting a major deadline and milestone in my career, I took some time for myself this weekend and watched the original Star Wars trilogy. (Yes, the good one.) This is something of a ritual for me, albeit one I’ve only performed only one other time in recent memory. It stretches back to junior high days when my brother and I, when we had a day off of school, would frequently watch all three movies right in a row off of a somewhat wobbly VHS tape made from early HBO airings. (If you’ve ever done this, you know just how very boring Jedi gets in the middle, but I digress.) Nowadays, three movies in three days is about all I have patience for (and Jedi still got boring), but it was comforting nonetheless.
While watching, I found myself saying lines out loud before they were said on screen, an annoying habit of mine. A few of these lines struck me as interesting, however. While Star Wars isn’t exactly the pinnacle of Western philosophy, my brain made some funny connections between the storylines and dialogue I know so well and librarianship. Here are a few examples:
“Use the force, Luke.” (Disembodied Obi-Wan voice to Luke, in the first movie.) We need to trust ourselves as skilled professionals. We know what we’re doing. Most of us in the library profession are in it because we love the work and believe we really can make a difference. This heavy personal investment in our work gives us the luxury to rely on our instincts in many cases, pushing forward with initiatives that are simply the right thing to do. Now we need to back up that instinct with reasonable plans, budget justifications, and all that administrative stuff, but I really do believe the best ideas come out of pure inspiration and vision, facilitated by the connections between us.
“R-2, you know better than to trust a strange computer.” (C3PO, Empire.) Now, the computer turned out to be right in this case, but we as librarians consider it part of our job to promote the effective evaluation of information. Many of the discussions today around this issue take an adversarial tone, as if the goal is to spot the misinformation and quash it. But we simply can’t just look at it as ferreting out the bad. We must not be judgmental. Instead, this evaluation can and should be just a routine part of our information flow. We simply need to evaluate everything. The source is only one factor among many that should be considered.
“What I told you is true, from a certain point of view.” (Blue-energy Obi-Wan dude, Jedi.) The role of perspective in truth or falsehood could provide more commentary on the evaluation of information theme, but I’ll take it in a slightly different direction – the role of metadata records in libraries. I find myself talking about this topic, inspired by Carl Lagoze, a great deal (and I believe on this blog before). A metadata record is necessarily a surrogate for a resource, and thus inherently takes a certain perspective on that resource in what it includes, what it leaves out, and the vocabularies it uses. We need to dispel ourselves of the myth that our records can or should be all things to all people, and instead focus on defining the views our metadata records need to support.
And of course the overall theme of the movies that a relatively small, smart, dedicated movement can effect sorely needed, large-scale change gives me good feelings for the future of libraries as well. So there you go. Little did Mr. Lucas know he was providing the library profession with a model to help guide our work. :-)
Oh, and I learned that my dog is strangely fascinated by Ewoks. Go figure.
Saturday, June 16, 2007
Re-imagining browsing
In a recent issue of Educause Review, Robert Kieft describes work at the Tri-Colleges outside
I like this idea, as when I do shelf browsing, I do open up books, read a random page or two, and just generally poke around to see if the book looks interesting. I also believe this is an area in which our current catalogs don’t remotely match the physical experience.
But I also think there’s more to browsing than having something catch your eye then explore it further. When shelf browsing, we don’t look at everything – we only look at select things. In a bookstore, a cover might catch one’s eye, but this is less likely to happen on shelves with plain library bindings. A title or an author might be the hook that causes one to pick up one book and not another. To some extent, this type of browsing activity is random.
But what if it were less random? Can we re-imagine (or at least extend) our notion of browsing to make it more targeted? I like to think of browsing both as the ability to “look inside” a resource to learn more about it before committing to it, and as a “more like this” feature that introduces me to resources that I didn’t previously know existed. Shelf browsing of course does this but is also obviously limited by physical constraints, so that only one aspect of a work can be brought out by a classification scheme that locates it on the shelf. This isn’t news to anyone—see for example, the much-blogged-about Everything is Miscellaneous by David Weinberger for a discussion of what he calls first-, second-, and third-order methods of organization.
What I’m interested in is the ability of our catalogs to bring out more flexibility in browsing. If I’m looking at a resource, I want to be able to note one feature of it, and instantly get other resources that share that feature. I’ll then want to be able to add or subtract features to exert control over the size and characteristics of my results set. Say I’ve just read The Godfather. I might want more mafia fiction after that. But that’s a long list that I might consider too broad. Perhaps I’d limit myself to novels about the Sicilian mafia, or set in the 1940s or ‘50s. Then after I select a work or two, I may want to move on to real-life mafia stories, but only from the mobster’s perspective (not law enforcement’s). I’d likely find Henry Hill’s story in Nicholas Pileggi’s Wiseguy, which could lead me to watching the movie Goodfellas, and start wondering what happened to Hill. From there I might start exploring the history of the witness protection program, then FBI training methods, then perhaps all the way to Thomas Harris’ Silence of the Lambs and the film that was made based on the novel! And now imagine we can do all of that in a few clicks, without having to type anything in, or to know any of those books or films exist. And imagine if this could happen across multiple databases of content (including those from outside the library sector, such as IMDB), without me ever having to know that.
Wednesday, June 06, 2007
Partnership announced between CIC libraries and Google
The CIC Libraries are now participating in Google Book Search, announced today.
Sunday, May 06, 2007
DC and RDA - the beginning of a beautiful friendship?
An interesting announcement was made this past week that the DC and RDA communities will be working together to do the following:
- development of an RDA Element Vocabulary
- development of an RDA DC Application Profile based on FRBR and FRAD
- disclosure of RDA Value Vocabularies using RDF/RDFS/SKOS
The news isn’t exactly taking the library community by storm, but the commentary I have seen has all been of the “this is a good thing, I’ll follow this with interest” theme. But something bothers me about this plan, and I’m having trouble deciding exactly what it is I find, well, wrong in some way.
There’s nothing in the announcement that indicates the development of RDA proper will be affected by this work; in fact, the indication in the announcement that funding will be sought for the activities outlined implies the work is a long way off, likely entirely too late to have any real effect on RDA. This seems to be to be entirely backwards – trying to harmonize DC principles with RDA after the fact. Didn’t the DC community learn its lesson about the pitfalls of this approach when developing the Abstract Model, only realizing long after developing a metadata element set that it would benefit from an underlying model?
This general approach failed miserably with the DC Libraries Application Profile. There, the application profile developers wanted to use some elements from MODS, but weren’t able to because MODS doesn’t conform to the DCMI Abstract Model. So basically what the DC community said here was that application profiles are great, they form the fundamental basis of DC extensibility, but, oh yeah, you can’t actually use elements from any other standards unless they conform to the Abstract Model, even though are no approved encodings for even DC itself more than two years after the Abstract Model was released. OK then. Way to foster collaboration between metadata communities.
But maybe the DC community will change paths and realize flexibility and collaboration get more users than intellectual rigor. It’s sad in some respects, but true. It wouldn’t be the first time there’s been a major shift in the direction of the DCMI. I don’t know that this is a good thing, but I’ll certainly follow it with interest.
Saturday, April 14, 2007
ALA Draft Digitization Principles
One big-picture issue I don't think is clear, however, is the label "digitization." The principles are for the most part not about the digitization (conversion from analog to digital format) process, nor do I think they should be. They're more about the properties of "digital libraries" as a whole, which have content that was once analog, content that is born digital, and perhaps even metadata about objects that aren't digital at all. These principles seem to describe systems and organizations more than just objects.
The "expand the scope even further" commentary is also particularly apt. Coming from ALA, the focus on "libraries" could, as one comment on the blog mentions, to exclude other producers and maintainers of digital content, even others in the cultural heritage sector such as archives and museums. The direction I'd like to see these principles expand is related to (buzzword warning) interoperability. (Don't fall asleep--although that term that is often empty in its usage it really does describe some essential concepts.) My reading of the principles seems to focus inward, on developing maintaining digital collections within a single institution or close consortium. But we have an opportunity now to move away from the traditional (another buzzword warning!) "silo" approach to libraries, and create systems that operate in a much more open fashion, promoting re-use and exchange of content and metadata in new and unexpected ways. The digital libraries we maintain shouldn't just be accessible through our well-designed interfaces intended for a human to interact with - we need to supplement that access with additional methods. These methods are constantly expanding, and it will be difficult for us to keep up, but we can't ignore them.
Thursday, April 12, 2007
Weighing in on speaker compensation
choose as invitees to engage in coffee conversations, eclectic dinner meetups, and learn from each of our communities through attending others' presentations? Certainly, not every invited speaker will be able to (or in some cases, want to) stay much longer than his or her talk, but we need models that encourage them to do so rather than making it difficult for them.
Saturday, March 24, 2007
Staying out of it, for now
Many of the discussions are ebbing to the point where it doesn’t make sense for me to weigh in, but even if they weren’t, my inclination is to sit back and watch rather than participate. Perhaps I’m a bit disillusioned thinking that my words will have very little impact. I’m glad the discussion is ongoing, and I still believe that most people out there are reasonable, and can behave themselves while engaging in professional discourse. But many of the current discussions have taken on an “us vs. them” bent, and when that happens I tend to stay away, avoiding situations where emotions and stereotypes start getting in the way of dialogue.
I am sad for feeling this way, but we all must make decisions about how best to spend our precious time. I believe I have a great deal to offer these discussions, particularly in expanding their scope beyond just “catalogs” to the types of digital library systems I’m involved with building and with types of materials not well-served by the MARC/AACR2/LCSH/etc. suite that’s the main focus of these discussions. I’ll probably post some thoughts to this blog (please do comment – I’m not afraid of discussion overall!) but for now I feel the time and intellectual investment of communicating in some of these other forums is best left to others. I’m hoping my aversion to participation is temporary, and that as I get caught up both with others thoughts and my own, I’m able to jump back into the community. See you soon.
Tuesday, January 30, 2007
Modular Vocabularies
Spilker, John D. “Toward an International Music Thesaurus” Fontes Artis Musicae 52/1, January - March 2005: 29-44.
The mention in the article of the debate about whether to include a facet for “content subjects” which would include “extra-musical associations” (i.e., topics, things the music is “about”) gave me pause, however. Looking more closely, I also see that they propose a “philosophies and religions” facet, ask the question if “distant” terms that appear in source vocabularies as compound terms relating them to music (e.g., “astrology and music”) should be retained in any way, and copy terms from an existing instrument vocabulary into their own instrument facet. This duplication of effort bothers me a great deal. I see this phenomenon fairly often—projects that try to do everything end up doing nothing very well. LCSH does this, trying to shove everything under the heading “subject” and not making very good distinctions between what type of subjects those terms are. Even in faceted vocabularies, I see communities (like the one described in this article) try to include everything that might be needed to index material for that community. The emerging Ethnographic Thesaurus, facing an enormous task in developing a vocabulary for a very large and diverse field, shows signs of this as well, but I know the editors are considering these issues as they move forward with development. I think this would work much better if these communities focused instead on only those facets that they have particular expertise in, and “borrowed” the rest from other communities.
There’s often an assumption in the library world that a record needs to use a single “subject” vocabulary. But as we move forward, surely that’s a constraint (even if it’s only perceived) we can break out of. There’s no reason vocabulary for different facets (notice how I’m assuming a faceted, post-coordinate structure here) has to come from the same vocabulary. Let’s leave each specific vocabulary to its experts, and not try to have musicians developing terminology for religion and astronomers developing terminology for book bindings.
There are many details to work out, for example, the user implications of one vocabulary using singular forms by default and another using plural (there are standards for such things, but let’s be realistic about how many vocabularies that are otherwise useful we’d throw out of consideration because of the tense of its headings), but there are technological means for doing this. I’d hate to see an inordinate focus on the (potentially many) small challenges derail the larger, necessary, move in a more flexible direction.