Many libraries are seeing efforts such as the Google Books Library Project, and think they need to follow suit by digitizing books in order not to be left behind. I worry that many of these libraries are jumping in just to be on the bandwagon without fully considering wheir their efforts fit in with those of others. Digitizing books, performing dirty OCR, and making use of existing metadata is about as easy as it gets in the digital library world (not that this is exactly a walk in the park), so it's an attractive option for libraries looking to make a splash with their first efforts to deliver their local collections online.
I argue that this is not the right approach for most libraries. That impact libraries are looking for as a result of digitization of local collections is achieved through the right ratio of benefit to users versus costs to the library. While the costs to the library are lower to digitize already-described, published books sitting on the shelves, the benefits are also lower than focusing on other types of materials (more on which materials I'm thinking of later...). We already have reasonable access to the books in our collection. I'll be the first to go on and on ad infinitum about the poor intellectual access we currently provide to our library materials. But there is some intellectual access. For books a library doesn't own, interlibrary loan is a slightly cumbersome but mostly reasonable method of delivering a title to a user. There are also a (comparatively) great many digitized books out there, without good registries of what's digitized and what isn't, or good ways to share digital versions when they do exist and the institution that owns the files is willing to share. Take the Google project - they're digitizing collections from five major research libraries, yet libraries planning digitization projects don't have access to lists of materials that are being digitized as part of this project, even though we expect to have some (not complete) access to these materials through Google's services at some point in the next few years. Even though library collections have surprisingly less duplication than one might expect, a library embarking on a digitization project for published books would be duplicating effort already spent to some non-negligible extent.
Libraries in the aggregate hold almost unimaginably vast amounts of material. We're simply never going to get around to digitizing all of it, or even the proportion we would select given any reasonable set of selection guidelines. An enormously small proportion of these materials are the "easy" type - books, published, with MARC records. The huge majority are rare or unique materials: historical photographs, letters, sound recordings, original works of art, rare imprints. These sorts of materials generally have grossly inadequate or no networked method of intellectual discovery. While digitizing and delivering online these collections would take more time, effort and money than published collections, I believe strongly that the increase in benefit greatly outweighs the additional costs. In the end, the impact of focusing our efforts on classes of materials that we currently underserve will be greater than taking the easy road. Our money is better spent focusing on those materials that are held by individual libraries, held by only few or no others, and to which virtually no intellectual access exists. Isn't this preferable to spending our money digitizing published books to which current access is reasonable, if not perfect?