Sunday, December 21, 2008

Wow.

This poor blog has been sorely neglected lately, and for that I apologize, both to you and to myself. Life has gotten a bit too crazy and I'm still trying to find a way to set some boundaries. But in the middle of several big work deadlines and several personal deadlines (including a 2000 mile road trip starting tomorrow, unexpectedly a day early!), I feel I have to take a minute to comment on this.

lcsh.info is no more.

Wow. I really don't know what to say. There's obviously a story behind this, and I know nothing of it. What I do know is that LC has been promising remote, machine-readable access to their authority files (SKOS is frequently mentioned, and if my memory serves being cited [indignantly] during the leadup to the release of the LC Working Group on the Future of Bibliographic Control as something LC is already working on, so stop harping on it already...) for YEARS now, but such a thing, as Ed notes, has not come to pass. Taken in the context of the recent controversy over the change in OCLC's record use policy, one has to wonder what's up.

I know our library universe is complex. The real world gets in the way of our ideals. (Sure I can share my code! Just let me find some time to clean it up first...) But at some point talk is just talk and action is something else entirely. So where are we with library data? All talk? Or will we take action too? If our leadership seems to be headed in the wrong direction, who is it that will emerge in their place? Does the momentum need to shift, and if so, how will we make this happen? Is this the opportunity for a grass-roots effort? I'm not sure the ones I see out there are really poised to have the effect they really need to have. So what next?

I mean, wow.

Saturday, September 27, 2008

This week's revalation

Too many interesting things going on, too little time to put them into words that others can read...

Something has been stewing in my head for a long time about RDA, and this week I'm at the OLAC/MOUG joint conference where the topic has come up a bit. RDA is supposed to be "made for the digital world." This is something I can completely get behind. But the drafts I've read (and I admit I gave up on them at some point, so maybe this has changed) don't seem to me that they're actually accomplishing that. It's the right goal, but the products I've seen don't meet it. And then it occurred to me: by "for the digital world" I think what the RDA folks actually mean is "catalog digital stuff" rather than "create data that can be used by machines as well as people." I'm interested in the latter, so that's what I was assuming they were interested in. But I'm now wondering if that assumption was false. If we have this problem with terminology for this long within our own profession, how in the world are we going to communicate effectively with others?

Monday, July 07, 2008

I couldn't resist

I'm not one to participate in many blog memes, but seeing all the Wordle clouds out there, I just couldn't resist creating one for FRBR.


Wednesday, May 07, 2008

LC statement on RDA

I've long been on the fence with regards to the development of RDA - is it a transformative event or total folly? I think I've finally come to the opinion that RDA is overall a positive thing, and that it represents a necessary (although of course not perfect) step forward in the ongoing evolution of libraries.

What got me thinking about these issues again was a recent letter from Deanna Marcum at LC explaining why LC was issuing a joint statement with the National Library of Medicine and the National Agricultural Library outlining a testing and decision-making plan for determining whether or not to fully implement RDA. The letter and statement essentially say that wide participation in RDA development is a Good Thing (tm), yet so is substantive evaluation of it. Not much to argue with there. (Well, we always do find something to argue about, don't we?)

The stated goals of RDA, as well as its scope and underlying principles, speak to me strongly. I like the idea of a content standard written with FRBR principles in mind. The goal of making library description interoperate better in the current information environment outside of libraries is of course a laudable one. In this way, just by clearly stating these and a handful of others as the rationale behind the work being done, we've made a significant step forward. We're responding to the world as it exists around us today.

The world is changing, though. The environment today won't be the environment tomorrow. There's no indication, and perhaps even no real hope, that what we decide today will be right in a year, three years, ten. That's a reality we have to face, and I've decided I'm in the camp that says we have to move forward anyways, analyzing the risk but not being afraid of it. Looking at RDA through this lens, will it meet the goals it has outlined? Probably not. I see much in the current drafts that don't demonstrate the overall goals well. But we've never done this before, at least not in this way. We're learning. We're going to make mistakes. The stakes are admittedly high, but they're also high if we don't act. RDA has already evolved from community input, and I suspect it will continue to do so. Maybe it doesn't even stick around that long - maybe we learn enough from writing and trying to implement it that another round is warranted with some key needed improvements. We've investing many resources in this, but that's part of life as well. Many things don't pan out, and that's certainly not unique to the library world. I realize our resources are scarce, but they're going to be zero soon if we don't think creatively. I think RDA is an attempt to do that.

I'm still concerned that RDA as a content standard is stepping too far in the direction of a structure standard for my taste. It's explicitly defining "elements" whereas for content standards I like to think of "classes of elements" to help us remember that instructions in a content standard aren't necessarily a 1:1 match with fields in a data record - this is what enables us to mix and match content and structure standards as we see fit. But I'm the first to admit that the distinction between a structure and a content standard is an artificial one, and that any given standard can blur the line a bit. My concern still lingers, however - the RDA Scope & Structure document uses "elements" and "properties" interchangeably, but I believe these terms, even in the context given here, have very different connotations. We'll see, I suppose, whether my concerns are valid. Maybe I'm just being pedantic about terminology. Or maybe there's a fundamental conceptual problem here. I'm a pragmatist - I realize the only way we're going to find out is to try it.

Friday, April 18, 2008

A small, interesting, and potentially more powerful than it is now, example of user-contributed metadata

I was rudely awakened just after 5:30 this morning by an earthquake. The odd thing about this is that I live in Indiana, not exactly a hotbed for such things. This being my first earthquake (does that mean I'm a "survivor"? heh) I got up to look around and turn on the local news. There wasn't anything at all about it on the local news for about ten minutes, and then when it did appear for a long time it was just the anchors saying "We thought we felt an earthquake, but we don't know anything yet. Call us if you thought you felt an earthquake too."

Meanwhile I'd long since found http://earthquake.usgs.gov/eqcenter/recenteqsus/, learned it was a noticeable and scary, but overall pretty routine 5.4 magnitude confirmed earthquake. The cool part (even at 5:45 in the morning) was the "Did you feel it? Tell us!" link. This link led to a form where one reports basic information like your zip code, how long the tremor lasted, and what kind of damage it caused. One question asks if your refrigerator door opened and food fell out. At that point I realized just how minor of an earthquake we'd had! But there are also some really interesting questions in there too - whether you were awake or asleep at the time, your level of fear, and what you did to protect yourself. I wonder what they're doing with this data - I can think of many interesting possibilities. I can think of more possibilities if the USGS were to provide this data for use by others. (Maybe they do - this is very much outside of my area of expertise!) We could have a lot of fun with this one.

Based on the, (ahem), "classic" design of this part of the USGS site, one might conclude this feature has been around for a while. Good for them - collecting this sort of data from users is a fantastic idea.

Tuesday, March 25, 2008

Scholars and practitioners

I spent yesterday afternoon and this morning at an advisory board meeting for the IMLS Digital Collections and Content project, lead by UIUC. I'm sitting next to Jeremy Frumkin, who was able to blog briefly about the project while we were sitting here, so I took that as a challenge that I should be writing up my thoughts on these issues as well. (Poor lonely neglected blog - if it were a house plant it would be all dried up! I'm not out of ideas by any means, what I am out of right now is energy.)

The IMLS DCC project is starting a new phase concentrating heavily on understanding what collection descriptions really are, and how they could be used to improve retrieval of items within them. It's a researcher-driven project, with most project leaders in the library school at UIUC. There are a few investigators representing digital library practitioners as well, and the advisory board reflects a similar diversity. In the LIS field in general, there's a pretty wide gulf between researchers and practitioners, but I've aways considered UIUC as one of the places where the situation is better than most. This project shows some of that separation - looking at the problem from both the theory and practice perspectives, and hoping to meet in the middle along the way. I see many potential pitfalls in this, but I also see paths that could work.

I'm overall very interested (and concerned, because I don't see a lot of good activity in this area!) about how we move practitioners (such as myself!) towards more consistent and useful work without all of us having to become researchers in the theoretical realm. I'm very interested in the theoretical research in areas like the ones DCC is studying, and see value (and fun!) in figuring things out just for the purpose of figuring them out. But we need better bridges between that theoretical work and how the greater understanding gained from it could be used to build better products. I think the IMLS DCC project is really trying to do that, and hope that the practitioners on the project staff (and advisory board, like me!) are able to help them reach the practitioner community that needs to hear it. I see this with my own work, thinking, "well we published a paper, what more do they want?" but I've found that's not enough. Some combination of publishing, conference papers, informal distribution like listservs and blogs, plus the crucial step of showing a concrete (if test) system that illustrates a research result is necessary. And possibly other mechanisms as well - I don't claim to know exactly how to do this, I'm just muddling through like the rest of us. But it's something I think it's worth our time to work on.

Saturday, March 01, 2008

Woohoo!

The metadata book I recently co-authored is now available. And only four months late. :-)

Monday, January 28, 2008

Metadata interoperability

I'm not especially handy. (Get up, those of you who just fell on the floor laughing at the understatement of the century. Stop it. Right now. I know who you are.) Inspired by a recent minor home repair (which would probably be a trivial repair for you normal folks!) involving a screwdriver (flathead, if you were wondering, I'm not quite that inept), I've been thinking about how to explain metadata interoperability in terms of tools.

I've long held that interoperability by prescribing a single way of doing things is unsustainable, even at a relatively small scale, and it seems to me those sets of many-sized gadgets can show us a path forward. Wrenches, bolts, sockets and the like are not all the same. Rather, a bolt is chosen based on what it needs to do - what functions it needs to support and the environment in which it needs to fit. The same is true for descriptive metadata standards. The ones used for a specific class of materials need to both match well with the materials themselves and are supported in the institutional environment.

On the other side, how to deal with a bolt may not be immediately obvious, but it's not all that hard to figure it out. There are many options for what size socket wrench would be needed to tighten or loosen it. It would be nice if the bolt clearly stated what size it was, and this happens in some cases but certainly not in the majority. The wrench needed might be of the type measured in inches or the type measured in millimeters. We can consider these akin to the different approaches to description taken by libraries and archives. A practiced eye can examine the bolt and guess at the right size wrench to try. They'll likely get close, maybe with one or two mis-steps. With trial and error, a novice can find the right wrench as well. I believe the same is true for a system using metadata from an outside source - a skilled human can tell a lot about how best to use it from a quick glance, and trying various tactics out on it will inform how well various choices work for both an expert and a novice. With time and expertise we can transfer some of these evaluations to a system rather than a human (we can do some of this now; for example determining which metadata format from a predefined list is in use can be easily automated), but the process is still the same.

The point as I see it is that for these types of tools there are many options, but each of them are well-defined and well-documented. The same is true for metadata. As we gain more experience, we work towards defining various approaches that work for a given class of materials in a given environment. We then don't need to re-evaluate everything every time we start something new; we can refer to existing knowledge to see that in this case that approach has proven to be useful.

Maybe this analogy fails once you look closer (especially on the tool side - what's here pretty much represents my entire understanding of those types of things), but, like any analogy, it's bound to break down at some point of examination. But I've been pondering a bit and for now I still think it's useful. Feel free to argue with me, though!

Sunday, January 20, 2008

Musings on RDA, LC Working Group Report, and various other random things

I’ve been staying on the outskirts of the flurry of activity the last few months surrounding RDA and the Library of Congress Working Group on the Future of Bibliographic Control. Various things have been swimming around in my brain during this time, and I think now they’re finally ready to come out. I don’t know that I have any conclusions or suggestions for future directions, but that’s what I see as the function of this blog – to think through things that may or may not end up anywhere interesting.

I read the LC Working Group on the Future of Bibliographic Control’s draft report, and submitted comments via the web form designed for quick email questions rather than substantive feedback in time for the Working Group’s consideration. I didn’t post my comments on the blog, as many others did, as I felt my comments were pretty boring – they mostly were of the type “this paragraph seems to say x, but I wonder if you really meant to say y….” Overall, I was impressed with the report, and thought it represented an admirable vision for the directions in which libraries should be heading. It also struck me as largely avoiding library politics, although I thought it was odd a specific reference to FAST disappeared between the draft and final reports – I wonder what that was about? I liked the boldness of the report pushing attention for special collections, and the tough questions about the continued utility of MARC and LCSH.

But I, like many others, found a bit of schizophrenia in some of the specific recommendations. The report is not afraid to take a bold stand on MARC, but stops well short of recommending a move away from tens of thousands of distributed copies of bibliographic records, and (new in the final version, I think) questions RDA’s move away from ISBD. The report recommends moving quickly to work on new bibliographic frameworks but even more forcefully says that RDA should wait before proceeding. It provides many recommendations discussing how to improve moving information in and out of the catalog but provides little in the way of rethinking the function of the catalog itself. I believe some of this inconsistency is the result of trying to address comments the WG received (although I don’t see any changes related to any of my comments in there!), but most of it is probably due to the fact that this is a committee effort, written and revised on a short schedule. The biggest disappointment for me in the final report was that my favorite recommendation from the draft lost all of its power. In the draft report, one of the recommendations relating to LIS curricula described some extremely technical and theoretical topics as essential to offer. I believe cultivating individuals with both system and information expertise is the single most effective things we can do to ensure libraries play a part in the future information environment. In the final report, this recommendation was sanitized to simply say LIS curricula should include “advanced knowledge and topics.” Bleah. That could mean anything.

One other thing regarding the LC WG report: representatives from both Google and Microsoft served on the Working Group, but I see little if any evidence in the report that these individuals contributed points of view that haven’t been making the rounds within the library community already. That’s unfortunate. We need some outside points of view in this community.

I know the term “bibliographic control” has been questioned in relationship to this report. Roy Tennant suggests “descriptive enrichment instead. I recognize the problems with bibliographic control – it sounds so authoritarian in the face of the open vision the report outlines. But all labels are words, and words have baggage. I’m not clever with names (my dog is named Daisy, if that gives you a sense of how un-creative I am in this area), but I’m skeptical that any brief name could capture what we’re trying to do here. “Descriptive enrichment” to me calls up images of armies of humans manually adding things to records, an image I think we don’t want to be promoting. So I’ll remain neutral on the name issue – if someone comes up with a new one that folks like, I’d be happy to start using it. But I’m unlikely to be the one thinking that new label up.

I read many of the responses to the LC WG report that appeared on blogs, and found myself agreeing with many of the points made, and disagreeing with others. Pretty standard reaction, I suspect. I found OCLC’s response quite odd, however. It had the general tenor of “we’re doing all that stuff already, don’t worry, just trust us…” while at the same time oversimplifying the issues in a way I found totally inappropriate for a response to a committee of experts. For example, the OCLC response touts its FRBR work as testing the WG didn’t realize was happening, but it glosses over the fact that the Work-level clustering and other FRBR-like things OCLC has been doing aren’t true FRBR implementations. This community needs clarity and hard truths on these issues right now, not something that’s been reviewed by marketing. OCLC Research and RLG Programs are now and have been doing extremely interesting things recently, but few if any of them make their way to the productized mainstream of OCLC in ways that promote the state of the art or even fit well with the vision outlined in the LC WG report. I hope OCLC takes the report’s recommendations to heart in the same way LC and the rest of us are trying to do.

I found RDA’s reaction (or lack thereof) to the LC WG report to be of note as well. The folks behind RDA (the “Committee of Principals” for those of you in the know for such things) have on the RDA web site a response dated the same day the WG closed its call for comments. Presumably they submitted this document as an official comment in the appropriate time frame. The response, as the preface to the final LC WG report notes, smacks of “we’re too far along to stop now,” which in my mind is equivalent to “we/he/she have worked really hard, so what we came up with must be good,” which I believe is completely and totally bogus. It also lays the guilt trip on LC – saying basically “we’d hate to lose your input.” What the response doesn’t do is address directly (aside from listing a few pseudo-FRBR implementations that one can’t imagine the LC WG didn’t know about) any of the concerns raised in the report. It looks to me like more of people talking past each other and being defensive rather than trying to find common ground. Of course, the RDA response says they won’t be stopping development, which is no surprise at all. (Really, did anyone think they would? Wishful thinking doesn’t count.)

Through all this, I remain agnostic about RDA. I figure at some point I’m going to have to form an opinion, but frankly, I haven’t had the time to invest to develop an informed one. I haven’t read the last set of drafts (released December-ish), and with previous drafts I had trouble devoting the mental energy to them to see the forest of general vision and effectiveness for the trees of specific rules. I like the idea of more explicit connections to FRBR being behind the new organization, but it looks awfully complex. FRBR of course is complex, but I can’t help wondering if there’s another way to make the connection. I also understand (but again, haven’t seen myself) that the new drafts and/or supporting documents use terminology from the DC Abstract Model, including “literal value surrogate” and the like. I’m as intimidated by the terminology as the next person, but I do think it’s worth it to introduce some intellectual stringency to this process. I’m just not sure how to do that and still make the documents accessible.

Martha Yee has put online a set of cataloging rules she’s developed as a response to what seems to be the insanity surrounding us. I’ve long thought Martha was a clear voice in pushing against the book-centric focus of the cataloging community and realizing the importance of the display of information to users (in addition to just how we store it), but I’ve found myself disagreeing strongly with some of her more recent work that seems not to understand the state of the art with regards to search engines, information retrieval, or artificial intelligence. I haven’t read her cataloging rules yet, but I’m encouraged that she’s come up with some sort of concrete alternative (rather than just complaining, like the rest of us do), and apparently seems to be working towards an RDF model for her cataloging rules – bravo! I think any new set of rules, to be successful, however, need to be written to take advantage of current machine processing technologies. Not having read either the latest RDA drafts or Yee’s rules, I can’t say whether they do this or not. One can only hope.

And hope is where I am for the future of libraries. We have a lot going on right now in libraries, and I consider that a good thing. To use an old adage, we can’t be so afraid we’ll make a mistake that it prevents us from doing anything at all. Because we will make mistakes. We’re human. No matter how many people we involve, no matter how many levels of review we have, there will be things we try that don’t work out. If we realize that ahead of time we’ll be able to recover and try new things that will work. We’ve done so much already, and we have in our community an enormous number of insightful, dedicated individuals with a vision for where we’re going. Now we just have to find a way to let that vision emerge from the bureaucracy and the power of inertia.