Over the holiday weekend, I read the paper by Thomas Mann, Will Google’s Keyword Searching Eliminate the Need for LC Cataloging and Classification? Mann presumes to know exactly what is possible (not just currently implemented) in a search engine - the paper is stuffed full of absolutes: "cannot," "only," and "will not." The paper seems to focus on Google as simply taking in words in a query, looking them all up in a word-by-word index of all documents, and performing some sort of relevance ranking on documents that contain the search terms. It not only assumes that Google takes this simplistic approach, it rejects that any further capabilities are even possible in a search engine.
I believe this is a thoroughly (and perhaps, in this case, deliberately) naive assessment of the situation. Just because library catalogs offer only simple fielded searching and straightforward keyword indexes doesn't mean all retrieval systems do the same. Mann ignores the possibility of a layer between the user's query and the word-by-word index. He states, "having only keyword access to content is that it cannot solve the problems of synonyms, variant phrases, and different languages being used for the same subjects." This statement confuses "keyword access" (just looking something up in a full-text index) with a system that uses a keyword index among other things for searching. Google could (and right now, does, with the ~ operator [thanks Pat, for the heads up on this!], and who of us library folk is to say they won't do this by default in Google Print) do synonym expansion on search terms before sending the query to the full-text index. Point is, it's not impossible to do this in a search system. The same idea goes for finding items in other languages - translation before the search is actually executed could be done. Ordering, grouping (yes, grouping!), and presentation of search results in this environment would require some advanced processing, but that's doable too.
Of course, there is a difference between what's possible and what's actually implemented in Google today. Mann's language confuses the two, by stating (incorrectly) what's possible using as evidence what's implemented. What's implemented today is the functionality in the Web search engine, but we shouldn't assume the same functionality will drive Google Print. This article uses rhetoric to stir the librarians up for their cause. But it does us a disservice by making false assumptions and obscuring the facts. There are arguments to be made for why libraries are still essential and relevant today. But rabble-rousing with partial truths isn't the way to make them.