Wednesday, April 05, 2006

Library digitization efforts

Many libraries are seeing efforts such as the Google Books Library Project, and think they need to follow suit by digitizing books in order not to be left behind. I worry that many of these libraries are jumping in just to be on the bandwagon without fully considering wheir their efforts fit in with those of others. Digitizing books, performing dirty OCR, and making use of existing metadata is about as easy as it gets in the digital library world (not that this is exactly a walk in the park), so it's an attractive option for libraries looking to make a splash with their first efforts to deliver their local collections online.

I argue that this is not the right approach for most libraries. That impact libraries are looking for as a result of digitization of local collections is achieved through the right ratio of benefit to users versus costs to the library. While the costs to the library are lower to digitize already-described, published books sitting on the shelves, the benefits are also lower than focusing on other types of materials (more on which materials I'm thinking of later...). We already have reasonable access to the books in our collection. I'll be the first to go on and on ad infinitum about the poor intellectual access we currently provide to our library materials. But there is some intellectual access. For books a library doesn't own, interlibrary loan is a slightly cumbersome but mostly reasonable method of delivering a title to a user. There are also a (comparatively) great many digitized books out there, without good registries of what's digitized and what isn't, or good ways to share digital versions when they do exist and the institution that owns the files is willing to share. Take the Google project - they're digitizing collections from five major research libraries, yet libraries planning digitization projects don't have access to lists of materials that are being digitized as part of this project, even though we expect to have some (not complete) access to these materials through Google's services at some point in the next few years. Even though library collections have surprisingly less duplication than one might expect, a library embarking on a digitization project for published books would be duplicating effort already spent to some non-negligible extent.

Libraries in the aggregate hold almost unimaginably vast amounts of material. We're simply never going to get around to digitizing all of it, or even the proportion we would select given any reasonable set of selection guidelines. An enormously small proportion of these materials are the "easy" type - books, published, with MARC records. The huge majority are rare or unique materials: historical photographs, letters, sound recordings, original works of art, rare imprints. These sorts of materials generally have grossly inadequate or no networked method of intellectual discovery. While digitizing and delivering online these collections would take more time, effort and money than published collections, I believe strongly that the increase in benefit greatly outweighs the additional costs. In the end, the impact of focusing our efforts on classes of materials that we currently underserve will be greater than taking the easy road. Our money is better spent focusing on those materials that are held by individual libraries, held by only few or no others, and to which virtually no intellectual access exists. Isn't this preferable to spending our money digitizing published books to which current access is reasonable, if not perfect?


Thom said...

Wow--sounds logical to me!

I like your phrase: "networked method of intellectual discovery." Human beings with eyes and ears and memories just can't get those darned USB cards to fit in a slot to upload their information (yet).

becbristol said...

I am curious what your opinion would be on images. It seems that many libraries now are converting slides to digital images most of these images can be loctaed some where else digitally. Do you think it is possible to create one repository for digital images with good metadata?
I understand that there are many copyright issues but if that issue can be it possible for a library to obtain a free subscription to a semantic digital image database?

Robin said...

The appeal of bringing rare and wonderful special collections material to a wider audience is clear, and most digitization programs in the big research libraries are trying to figure out the right balance between that and conversion of print collections. We're trying to get the most benefit for not only our users but for the world-wide community (both scholars and non-scholars) from our institution's limited digitization resources. And just to be the Devil's advocate: as valuable (and sexy!) as special collections material is, full-text searching across large collections of books and serials will expose for retrieval tremendous amounts of information never revealed by the access points of our catalogs.

About wanting a list of what's being digitized through Google Books: For all the press that this project has received, it is still in its early stages. Even if digitizing books is "about as easy as it gets", doing it on this scale poses some daunting logistical problems. However, participants are actively working together with each other and with OCLC and RLG to insure that both Google's versions and the versions at the contributing libraries will be visible in WorldCat and in the RLG Union Catalog. The plan is that the versions for which libraries take archival responsibility will be marked for inclusion in the Registry of Digital Masters. So, we can only ask for patience. The information will be available to the community as soon as we can manage it.