Sunday, February 27, 2011

So I had a dream last night where Lorcan Dempsey had a really good idea that I want to riff on

So I had a dream last night where Lorcan Dempsey had a really good idea that I want to riff on. Of course, it was "dream Lorcan" who got my mind rolling which I guess is essentially me and not him. Which is weird. I also don't believe this idea is really new, though at the moment I can't lay my brain on exactly where it is being discussed, at least in the way I'm thinking about it. (Please show me those places if I don't know about them or learned about them and have forgotten all but the nugget of their ideas, which obviously I'm appropriating just now.) I'm not sure what this all says about me, but that's never stopped me in the past from just going ahead and talking, so here we are. I've also been fixated on this idea enough that I'm writing this pre-coffee and pre-the rest of my usual weekend wake up routine, so we'll see how that goes. Moving on now.

I've recently switched jobs within libraryland (loving the new one, thanks!), and am not doing on the ground metadata work any more. It was time for a shift, though I do definitely miss working with metadata issues. But that shift has given me some distance to reflect more on the overall state of the library metadata landscape. In this dream, I was setting up at table in the front of the room for some conference or another. The session was a good ways away from starting, and for a while it was only the other speaker and me in the room. We were supposed to be presenting about really big picture issues in libraries, and we were chatting a bit about what we were going to say, about the overall dearth of big ideas in libraries (especially in metadata), and about our own personal insecurities about what we were about to say in the session. Enter Lorcan. He heard us, and entered the conversation. He had some amazing ideas that the other speaker wrote down and said "oh, great, I should definitely talk about that!" Of course, since this was a dream, I have no recollection of what those were, but in the dream they were brilliant, trust me. And then he said it. "You know, we really need professional help."

What dream Lorcan meant by that is that libraries are entering new areas of work, trying out new ideas, and talking the big talk about interoperating beyond our borders, but that were really not very good at all about truly working with others outside of our own culture to do this, or to do anything beyond us taking inspiration from something going on outside then working only inside to try to appropriate them. And, in my opinion, that's a problem.

Of course we want to take ideas from elsewhere and make them our own. And of course, "we" understand our culture, mission, goals, ideals, etc., so we need to be core players in adopting ideas from other communities, and in many cases we can indeed do it on our own. But this idea has me thinking doing that alone for some critical ideas is too insular an approach. Much of library culture is good, but not all of it is. Sometimes when we try to work on ideas and services from other communities we corrupt them in ways that we really shouldn't. Sometimes when we adopt an idea to work for our core values and vision we also impose some of our baggage on that idea. And we need someone to call us on that, but there's nobody that doesn't have that baggage close enough to do it.

I see library metadata as a prime example of this problem. Take the RDA Vocabularies work (aka "try to turn RDA into something that allows library metadata to interoperate outside libraries.") [1]. There's a core team of great people working on this that really understand the issues from the library perspective, and are able to think beyond many of the constraints libraries place on ourselves. I share their frustration with the lack of engagement in this aspect of RDA within the library community, and with them desperately seek ways to raise awareness within libraries of the issues raised by this work. However, the work is being done by library people learning about other communities' expectations and models, and attempting to map library metadata practices on top of them. They're making progress: identifying areas that might be problematic beyond libraries such as the explicit connection of many RDA properties to FRBR entities, and the packaging together of different data elements into a publication statement. But these are details. I worry that there are more fundamental mismatches that we're not seeing because we don't have the right people at the table. And by this I'm not exactly thinking FRBR vs. non-FRBR, though indeed that is going to be an issue. I'm thinking somehow higher level than that, but I'm having trouble articulating what that is exactly. What I think we also need is non-library people learning about library expectations and models to help us more effectively work towards library metadata being useful "out there." I won't say there are none of these people and partnerships today, but where they do exist I believe they're result of individual interest, or small-scale partnerships set up through personal connections. What we're missing is the high-level engagement of the two communities. Libraries in the large sense need to be visible enough to cause other communities to be interested in the data we could potentially provide. If this is going to work we need both organizational weight and grassroots efforts that will eventually meet in the middle. Right now we only have the grassroots stuff.

And when I talk about outside, professional help, I most definitely DO NOT mean a traditional consultant that's paid to come in, talk to some people, and produce a report on some relatively short time frame (a year or less). These folks rarely understand the underlying issues of the situation they're asked to comment on, and typically produce reports that are more about their own agendas than about the work they're analyzing. What I mean is more of an ongoing dialogue and partnership, and the library community has to find ways to get other interested in spending time, energy, and resources on that work. More on that in a moment.

Another area where this might have worked better is the Linked Library Data Incubator Group from the W3C. This is very much an "outside" thing, which of course is promising. I don't know the deep history of this group, but it seems to me to have been put together largely from outside interest and not from library leadership. There are LC folks involved, but I get the sense that's more individuals rather than the organization as a whole. They're at the table (thankfully) but I haven't heard that LC as an organization is a real leader for this initiative. It's as if LLD, id.loc.gov, and the MADS RDF Ontology are reduced to experimental status rather than true strategic directions for LC (at least beyond one department). LC also hasn't put much weight at all behind the RDA Vocabularies work. They're kind of milling about rather than leveraging their position. And beyond LC (and a bit of OCLC, which suffers from the same problem) the LLD group looks to me like outsiders and theorists. They've done an admirable job of publicly calling for use cases, but it seems to me this effort is too far on the other side of the continuum from the RDA Vocabularies work. They're to be absolutely commended for putting this together at all, don't get me wrong. I totally believe in what they're doing. But I don't see this yet as the true strategic partnership geared at getting the right people in the room to find all the places where we say the same word but mean two different things yet. Maybe we just need to try a few more of these types of these efforts before we really understand what the right makeup is.

But if LC and OCLC are at this cocktail party but mostly talking to each other near the wall rather than having an impassioned theoretical discussion with someone they just met, DLF skipped the reception entirely to go see a show. They're not even in these discussions, at least not as far as I have heard. (Anyone want to correct me?) Some of the best public/private partnerships in libraries are represented by the DLF community. I believe the big, looking outside issues facing library metadata are core to what DLF does, but even if others disagree, surely we can learn from the mechanics of these types of partnerships that have been set up for non-metadata work, using the collective knowledge of DLF. Now I'm thinking I should step up or shut up and say specifically what DLF should do in this area. Good point, I'll definitely work on that, and have some inklings of ideas that aren't quite ready for a forum like this yet.

Here's the core of what I'm saying: if it's time for a sea-change in library metadata, we have to truly shift our thinking. It's not just about making RDA data RDF-ified (though we do have to go through that exercise as one way to expose differences between the models). It's going further than that to truly understand where our assumptions collide. And we have to do that from both ends: library people learning about other models, and people with other models learning about libraries. I don't believe we're facilitating enough of the latter. We have lots of smart library people working on this problem, but they're primarily working by reading and experimenting on their own. We also need more opportunities for actual engagement between our community and others. Right now basically we're writing each other memos rather than actually hashing it out.

And this is all because dream Lorcan told me that I (well, we), need professional help. Aren't dreams funny?

[1] Hillmann, Diane, Karen Coyle, Jon Phipps, and Gordon Dunsire. (January/February 2010) "RDA Vocabularies: Process, Outcome, Use." D-Lib Magazine 16, no. 1/2. http://www.dlib.org/dlib/january10/hillmann/01hillmann.html

Monday, June 21, 2010

Visualization of the Metadata Universe

I know, I know, this poor blog is basically abandoned. I really do want to find more time to spend on it. But in the meantime, I wanted to post here an announcement I just sent out to a bunch of places. I'm pretty excited about this!

The sheer number of metadata standards in the cultural heritage sector is overwhelming, and their inter-relationships further complicate the situation. A new resource, Seeing Standards: A Visualization of the Metadata Universe, , is intended to assist planners with the selection and implementation of metadata standards. Seeing Standards is in two parts: (1) a poster-sized visualization plotting standards based on their applicability in a variety of contexts, and (2) a glossary of metadata standards in either poster or pamphlet form.

Each of the 105 standards listed is evaluated on its strength of application to defined categories in each of four axes: community, domain, function, and purpose. Standards more strongly allied with a category are displayed towards the center of each hemisphere, and those still applicable but less strongly allied are displayed along the edges. The strength of a standard in a given category is determined by a mixture of its adoption in that category, its design intent, and its overall appropriateness for use in that category.

The standards represented are among those most heavily used or publicized in the cultural heritage community, though certainly not all standards that might be relevant are included. A small set of the metadata standards plotted on the main visualization also appear as highlights above the graphic. These represent the most commonly known or discussed standards for cultural heritage metadata.

Work preparing Seeing Standards was supported by a professional development grant from the Indiana University Libraries. Content was developed by Jenn Riley, Metadata Librarian in the Indiana University Digital Library Program. Design work was performed by Devin Becker of the Indiana University School of Library and Information Science, and soon to be Digital Initiatives & Scholarly Communications Librarian at the University of Idaho.

I hope this resource proves to be helpful to those working with metadata standards in libraries, archives, museums, and other cultural heritage institutions.

Wednesday, September 23, 2009

Completely backwards

Emails, blog posts, and tweets are flying by regarding OCLC's recent message to OAI-PMH data providers asking them to agree to a set of Terms & Conditions allowing OCLC to include data harvested via OAI-PMH in both free and toll services that OCLC provides. We do love our drama in the library community!

I agree with the predominant theme that this has all been handled very poorly, but I think the biggest problem lies somewhere else entirely. OCLC has set this whole system up completely backwards. OAI-PMH is a mechanism to share metadata widely, without having 1:1 agreements between data providers and service providers (harvesters). The entire point is to reduce the overhead of sharing. OCLC asking each data provider to check their status and preferences against OCLC's ideal is the wrong way 'round! The way this really should be done is with data providers making clear statements about what can and can't be done (per both copyright and license) with the metadata they're sharing. And, oh, look, OAI-PMH, already lets data providers do that.

To be fair, there's lots of data provider software out there that doesn't support this optional part of the profile. Still others are using software that provides for this but they don't go to the effort to use it. My own repository doesn't have this mechanism in place. (Working on it, I promise!) But this really is the way it has to be for any kind of open data initiative to work. I as a data provider put my metadata (and content if I can!) up, make it clear what copyright terms apply and what license terms I place on its use, and let the sharing begin. The burden must be on the service provider (or harvester, OCLC/OAIster in this case) to determine if the use they want to put the data to conforms with my terms. Service providers should bear the load of managing multiple data providers - it's part of the work they have to do to set up the service. If they want the free stuff, they have to do the work to figure out if their efforts are kosher. OCLC must be responsible for protecting themselves from lawsuits stemming from their use of stuff they're not supposed to, rather than transferring that responsibility to us as data providers.

But I have to temper the other side of this too. I was a member of the group that developed this set of recommendations, urging data providers not to put undue restrictions over reuse of their metadata. I really believe this is the right way to go. Of course we as data providers are sometimes under legal (copyright, contract, etc.) constraints that limit what we can do with our metadata. We have to honor those agreements. But for the vast majority of our stuff, we can share without restriction if we choose to. Giving up control is part of sharing, and we have to learn to live with that. Blessing certain uses and banning others is a dangerous business, and one that doesn't mix very well with the open sharing of information libraries are all about. As the Creative Commons recently found, even "non-commercial use" isn't a very straightforward issue, so I don't think it serves us well to fall back on that old standby. Freedom is about taking the inevitable small amounts of bad with the overwhelming good, and I really do believe those principles apply to information sharing as well. Let's spend our efforts on sharing more and better information, and less on metering out what we do have.

Monday, July 27, 2009

Thoughts on FRSAD

I don't usually publish my individual comments on things sent out for review within our community, but I've decided to make an exception for the FRSAD report. I'm actively working with a FRBR implementation (and trying to take in as much of FRAD as we can), and anything I can do to help push FRSAD (FRSAR? what's in a name? ha - there's got to be a FRAD joke in there somewhere...) to be useful to the work I'm doing I see as a good thing. So here are the comments I sent in through official channels.

-----------------

In short, I think good work has been done here but it doesn't meet my needs as someone working diligently (and actively implementing FRBR and FRAD) to re-imagine discovery systems in libraries.

While I am a great believer in the power of user studies to inform metadata models, I believe inappropriate conclusions have been drawn here. It doesn't surprise me at all that users had trouble sorting actual subjects into categories such as concept, object, event, place. But that doesn't mean our models shouldn’t make that distinction. Users wouldn't be able to distinguish between Work/Expression/Manifestation/Item, either, but those are still useful entities for us to use underlying our systems.

The draft report rightly notes that the concept/object/event/place division is only one way of looking at it, that other divisions such as those outlined by Ranganathan and the framework (which seems to be basically abandoned?). But that's the very essence of a *model* - to pick one of many possible representations and go with it, in order to achieve a purpose. The fact that competing interpretations are possible is not a rationale for abandoning selecting one that can advance the purpose of the model (even taken together with user studies showing users don’t gravitate to any one specific division). By choosing concept/object/event/place (or Ranganathan's model, or , or any other option) we can delve deeper into the modeling we need to do and provide a way forward for our discovery systems. By refusing to do so, we don't advance our case the way we must.

The thema/nomen structure outlined here is very useful. However, I believe strongly the report should not stop here. Going further is often stated here as "implementation dependent" but I think there is a great deal of room for the conceptual model to grow without venturing into actual implementations. Certainly FRBR and FRAD take that approach.

In general, the thema/nomen structure could apply to any attribute or relationship under vocabulary control. There is great (and unfortunately here unexplored) potential for this model to apply beyond simply aboutness. Limiting it in this way I believe is a disservice to those of us who are attempting to use these models to reinvent discovery systems.

I'm concerned about the significant lack of cohesion between the FRBR, FRAD, and FRSAR reports. They show their nature of independently generated by different groups with different interests over a long span of time. This limitation definitely needs to be overcome if these reports are to be useful as a whole for the community. Each could be used on its own, but we need a more coherent group. In fact, the thema/nomen structure in the FRSAD draft isn't really all that different than the (whatever entity)/name structure presented in FRAD. Much greater cohesion of the three reports could be made - what's written here seems to ignore FRAD in particular. I believe this is a missed opportunity. I think the most significant mismatch between the three reports is where they draw the boundary for how far a "conceptual model" should go.

On a higher level note, the report reads more as an academic paper outlining alternative options rather than providing a straightforward definition of the conceptual model. I respect the background work done here, and believe it needs to be done. There's a lot of room for papers like that in this environment; however, this report series needs to serve practitioners better and stick closer to the model.

On a more practical note, in the report the Getty AAT is often referred to by example. Yet most of the facets in the AAT bring out the "isness" (which in the introduction is explicitly described as out of scope) rather than "ofness" or "aboutness". For example, on p. 45, #7 under "select," "ale glass" in AAT is intended to be used for works of art that ARE ale glasses, not works (presumably textual) that are ABOUT ale glasses. This internal inconsistency is a serious flaw in the report.

I'm certainly not one to promote precoordinated vocabularies, but they exist in library metadata and we must deal with them. It's unclear to me from this report how these fit into the model proposed.

Sunday, May 03, 2009

DLF Aquifer Metadata Working Group "Lessons Learned" report available

That moment when a long-term project comes to an end is always simultaneously filled with relief and sadness. Relief in that new opportunities can be embraced and a pretty package placed around what was accomplished, with appropriate rationales for what didn't make its way into the package. Sadness in that productive and creative working relationships come to a close or change, and that there is always more to be done that cannot for practical reasons be embarked upon at this time.

The Digital Library Federation's Aquifer initiative wrapped up this spring, and causes me to experience that moment of relief and sadness. (Well, to be honest, several moments!) I've been involved with Aquifer from the beginning, and during that time my relationship with it evolved from skepticism to "just jump in and see what you can do" to "bite off one reasonable chunk of a problem and do your best to make this chunk work with other chunks." A report the Metadata Working Group just released, "Advancing the State of the Art in Distributed Digital Libraries: Accomplishments of and Lessons Learned from the Digital Library Federation Aquifer Metadata Working Group," reflects that last approach, attempting to place our work in an ever-evolving context. There is much more that could have been done, and the limitations and benefits of a volunteer committee to do work like this is more evident to me now than ever. Nevertheless, I'm proud of the work this group did. Congratulations to all involved on sucessfully navigating through our many tasks.

The message I sent out about this report to various listservs included the following "thank you":
The Aquifer Metadata Working Group would like to thank all who have been involved with the initiative, including current and past Working Group members; the Aquifer American Social History Online project team; participants in ground-breaking precursor activities such as the DLF/NSDL OAI-PMH Best Practices; individuals and institutions who tested, implemented, and provided feedback on the Metadata Working Group's MODS Guidelines and other work products; and of course DLF for its ongoing support. It's been a wild, educational, and wholly enjoyable ride!
I can't state with enough gratitude the role the community has played in what the Aquifer Metadata Working Group was able to accomplish. I like to talk with those thinking of entering the digital library field just how much of our work is figuring it out as you go - we're constantly refining models to apply to new types of material and take advantage of new technologies. My absolute favorite part about working in this area is navigating the tricky path of effectively building on previous work while pushing the envelope at the same time. I hope the Aquifer Metadata Working Group's contributions continue to be useful as building blocks for a long time to come.

Thursday, March 05, 2009

Must Watch! Michael Edson: "Web Tech Guy and Angry Staff Person"

I heard Michael Edson (Director of Web and New Media Strategy for the Smithsonian) speak at the IMLS WebWise conference last week. He delivered an astonishingly good talk centering around an animation entitled "Web Tech Guy and Angry Staff Person." It's a riot, and the animation sets a lighthearted attitude that reinforces his disclaimer that he's not poking fun or diminishing the very real tensions cultural heritage institutions face as our communication, collection, and even the dreaded B-word (business!) models change underneath us. Instead, I believe it's effective in using exaggeration to highlight some underlying issues and think intelligently about what it takes to say we CAN do something rather than taking the easy road and saying no. We can't just dismiss the challenges - understanding them will help us address them.

Sunday, March 01, 2009

Google vs. Semantic Web

On a number of fronts recently I've been thinking a bunch about RDF, the DCMI Abstract Model, and the Semantic Web, all with an eye towards understanding these things more than I have in the past. I think I've made some progress, although I can't claim to fully grok any of these yet. One thing does occur to me, although it's probably a gross oversimplification. The difference in the Semantic Web/RDF approach from the, say, Google approach is this: is the robustness in the data or is it in the system?

The Semantic Web (et al) would like the data to be self-explanatory, to say itself explicitly what it is it is describing and with explicit reference to all the properties used in the description. The opposite end of the spectrum is systems like Google which assume some kind of intelligence went into the creation of the data but doesn't expect the data itself to explicitly manifest it. The approach of these systems is to reverse engineer that data, getting at the human intelligence that created it in the first place.

The difference is one of who is expected to to the work - the sytem encoding the data in the first place (Semantic Web approach) or the system decoding the data for use in a specific application. Both obviously present challenges, and it's not clear to me at this point which will "win." Maybe the "good enough and a person can go the last bit" approach really is appropriate - no system can be perfect! Or maybe as information systems evolve our standards for the performance of these systems will be raised to a degree where self-describing data is demanded. As a moderate, I guess I think both will probably be necessary for different uses. But which way will the library community go? Can we afford to have feet in both camps into the future?