Monday, December 19, 2005
Out of the loop
On Web4Lib, there was a ruckus recently about activism, copyright, and free expression. It at times descended into obscenities and name-calling, but also raised a number of thought-provoking questions about the information landscape and the maintenance of relevant and professional forums for discussion about issues that librarians should care about.
The Shifted Librarian, Jenny Levine, with a reasonable concern about the lack of comping registration fees for invited speakers at library conferences, sparked a rousing debate about conference economics, the value of institutional support for professional development, and a librarian's responsibility to give back to the profession.
I'm taking all of this as a reminder to step back from the inevitible daily emergencies and petty disagreements to think about the larger issues: why I'm a librarian in the first place, how I can contribute to our shared mission, and what our users really need in this day and age. I'm going to take some time over the next few weeks, as I have some time off (in between all the writing I have been putting off!), to reflect on these issues and re-focus my work. I hope everyone out there has a similar opportunity.
Thursday, December 08, 2005
DLF MODS Implementation Guidelines available for public comment and review
The primary goal of the Digital Library Federation's Aquifer Initiative is to enable distributed content to be used effectively by libraries and scholars for teaching, learning, and research. The provision of rich, shareable metadata for this distributed content is an important step towards this goal. To this end, the Metadata Working Group of the DLF Aquifer Initiative has developed a set of implementation guidelines for the Metadata Object Description Schema (MODS). These guidelines are meant specifically for metadata records that are to be shared (whether by the Open Archives Initiative Protocol for Metadata Harvesting (OAI PMH) or other means) and that describe digital cultural heritage and humanities-based scholarly resources. The Guidelines are available at http://www.diglib.org/aquifer/DLF_MODS_ImpGuidelines_ver4.pdf (pdf document about 470 kb).
In order to ensure the Implementation Guidelines are useful and coherent, we are collecting comments and feedback from the wider digital library community. We appreciate any and all comments, feedback, and questions. These may be sent to DLF-MODS-GUIDELINES-COMMENTS-L@LISTSERV.INDIANA.EDU. The deadline for comments and review is January 20, 2006.
DLF Aquifer Metadata Working Group:
Sarah Shreeves (Chair) - University of Illinois at Urbana-Champaign John Chapman - University of Minnesota Bill Landis - California Digital Library Liz Milewicz - Emory University David Reynolds - Johns Hopkins University Jenn Riley - Indiana University Gary Shawver - New York University
Tuesday, November 15, 2005
Learning cool new things
Saturday, November 05, 2005
True folksonomic thesauri?
Yet I'd never thought about relationships for folksonomic vocabularies before. I think it's a fantastic idea, however. The same strategies for improving end-user discovery based on term relationships can be used no matter where these relationships come from. Relationships determined by methods such as this could be used in the same way human-generated relationships in a formal thesaurus could be used. I wonder if these relationships might be even more important in a folksonomic environment, as a method by which the vocabulary control us library folk hold so dear could be achieved.
Wednesday, October 26, 2005
Hierarchical catalog records
I believe the same is true of efforts to use MARC for FRBRized records. The MARC format could be adopted for this purpose. But is it in our best interests to do so? Using MARC makes the task seem less scary, that it won't be that difficult. But it is a difficult task, and we're fooling ourselves if we pretend otherwise. I wonder if we aren't better off addressing the issue head-on, admitting to a change with a new base record format. The change would be one of mind-set, rather than functionality.
I've mentioned I believe the FRBRization task is difficult. I don't believe difficult means impossible in this case, however. We don't yet have a good sense of the cost associated with such a conversion, so any claim to its value will be tempered by that uncertainty. But I am convinced of that value, and I believe studies like that of the Perseus Digital Library are vital in demonstrating it. No cost can be justified without first understanding the associated benefit. We have a great deal more work to do to reach that understanding.
Sunday, October 23, 2005
You know you go to too many conferences when...
Saturday, October 22, 2005
Separating data entry from data structure
But of course current technology provides many possibilities for a design layer in between the data entry interface and the data storage format. Metadata creation by humans is expensive. We need to do everything we can to design data entry interfaces that speed this process along, that help the cataloger to create high-quality data quickly. Visual cues, tab completion, and keyboard shortcuts are just a few simple tricks that could help. More fundamental approaches like automatic inclusion of boilerplate text and integration of controlled vocabularies could provide enormous strides forward.
Yet with all of this potential, I frequently (WAAAAAY too frequently) have conversations with librarians where it becomes clear they're focused exclusively on the data output format. It never even occurred to them that a system could do something with entered data that doesn't require cataloger involvement. (Man, I knew we librarians were control freaks, but this really takes the cake.) Of course, librarians aren't on the whole system designers. That's OK. But all librarians still need to be able to think creatively about possibilities. I'm convinced that the way forward here is to take the initiative to develop systems that demonstrate this potential, that show everyone what is possible with today's technology. Everyone has vision, yet that vision always has limits. By demonstrating explicitly a few steps forward from where we are, vision can then expand that much further.
Sunday, October 09, 2005
Museums and user-contributed metadata
So imagine my pleasure when, catching up on my reading this weekend, I came across "Social Terminology Enhancement through Vernacular Engagement" by David Bearman and Jennifer Trant in September's D-Lib Magazine. (Yes, I do know it's no longer September. Thanks for asking.) I'm thrilled to hear about this initiative, especially how well-developed it seems to be. I haven't yet followed the citations in the article to read any of the project documentation, but it certainly looks extensive. In the digital library (and museum!) world, I firmly believe ongoing documentation such as this associated with a project can be of as much or even more value than formally-published reports.
Two features strike me about the "Steve" system described here, that make it clear to me there are many ways to implement systems collecting metadata from users. It also makes me realize these decisions need to be made at the very beginning of a project, as they drive all other implementation decisions. The first is an assumption that the user interacting with the system is charged with the task of description rather than simply reacting to something they see and perceive either as an error or an omission. The user is interacting with the system for the purpose of contributing metadata; finding resources relevant to an information need is not the point. I suppose different users end up contributing with this model than with one that allows users to comment casually on resources they find in the course of doing other work. Different users might affect the "authoritativeness" of the metadata being contributed, but I wonder to what degree.
The second feature I find notable is that the system is designed to be folksonomic; there is no attempt at vocabulary control. Us library folk tend to start from the assumption that controlled vocabulary is better than uncontrolled and move on from there. At first glance, some of the reports from this project seem to resist that assumption, and start from the beginning looking for a real comparison. I'm anxious to read on.
Thursday, October 06, 2005
User-contributed metadata
But, anyhoo... incorporating user-contributed metadata into library systems is something I've been thinking about for a while. Librarians tend to be pretty wedded to the notion of authority, that as curators of knowledge we're the best qualified folks out there to perform the documentation of bibliographic information. Assuming for a moment that this is true for some data elements, there are still several classes of data that could easily benefit from end-user involvement.
The first is detailed information from specialized domains. I work on a number of projects related to music. Information such as exactly which members of a jazz combo play on any given piece on a CD or the date of composition of a relatively obscure work is the sort of thing our catalogs could be providing to serve as research systems instead of just finding systems. But this sort of metadata is expensive to create; it requires research and domain expertise on the part of the cataloger. Many of our users, however, do have this specialized knowledge and love to share it.
Other information that might be appropriate for supplying by end-users could be tables of contents, instrumentation of a musical work, language of a text, and others of this type of "objective" information. Before you say, "But what about standard terminology, spelling, capitalization?!?" in a panicked voice, consider basic interface capabilities in 21st-century systems such as picking values from provided lists rather than typing them in.
But should we restrict ourselves to these more obvious of elements? I've been hoping for some time to be able to test various degrees of vetting of user-contributed metadata to a digital library system. I have in mind a completely open Wiki-type system, one that simply sends a suggestion to a cataloger, and a number of options in between. I suspect the quality of the user-contributed metadata will be overall much higher than critics assume. Yet even if it isn't, what sort of trade-off between quality and quantity are we willing to make? Traditional cataloging operations don't have extensive quality control operations, perhaps because QC is expensive work. And catalogers make mistakes, every day, just like the rest of us. Assuming a system where users can correct errors, how quickly will errors (made by a cataloger or by another end-user) be found and corrected? Will the "correct" data win out in the end? Surely these issues are worth a serious look.
Tuesday, September 27, 2005
The more things change...
The rationale behind the MARC music format reads full of hope, for improved access for users and higher quality data. Yet many of the improvements mentioned have not come to fruition. I'm heartened to see the vision represented here for the type of access we can and should be providing. Yet I'm discouraged to see more evidence that we haven't achieved this level of access in the time since the MARC format was implemented. I believe this serves to remind us that many factors other than database structure contribute to the success of a library system.
I also learned a valuable lesson reading this text that ideas and potential alone are not enough to convince everyone that any given change is a good idea. A large percentage of librarians out there have heard these very arguments before and seen them not pan out. I do believe, however, that this time can be different. (Yes, I know how that sounds...) Computer systems are much more flexible than they were when the MARC music format was first implemented, and can be designed to alleviate more of the human effort than before. We've learned a great deal from automation and implementation of the MARC format that we can build on in the next generation library catalog. We have a long road ahead of us, but I think it's time to address these issues head-on once again. I'd like to believe we can leverage the experience of those like Donald Siebert involved in the first round of MARC implementation, together with experts in recent developments, to make progress towards our larger goal.
Sunday, September 18, 2005
The next big thing in searching?
So what's the third generation? Where are we going next? I think the next step is grouping in search results. Grouping is where I see the power of Google-like search systems merging with library priorities like vocabulary control. Imagine systems that allow the user to explore (and refine) a result set by a specific meaning of a search term that has multiple meanings, by format, or by any number of other features meaningful to that user for that query at that time. I picture highly adaptive systems far more interactive than those we see today. Options for search refinement alone, I don't believe, go far enough, as they require the user to deduce patterns in the result set. I believe systems should explicitly tell users about some of those patterns and use them to present the result set in a more meaningful way. Search engines like Clusty are starting to incorporate some of these ideas. It remains to be seen if they catch on.
FRBR assumes this sort of grouping can be provided, using the different levels of group 1 entities. Discussions of FRBR displays frequently talk about presenting Expressions with a language for textual items, with a director for film, or with a performer for music, allowing users to select the Expression most useful to them before viewing Manifestations. What's missing is how the system knows what bits of information would be relevant for distinguishing between Expressions, since these bits of information will be different for different types of materials, and sometimes even with similar types of materials. We have a ways to go before the type of system I'm imagining reaches maturity.
Wednesday, September 07, 2005
Dangers of assumptions
I believe this is a thoroughly (and perhaps, in this case, deliberately) naive assessment of the situation. Just because library catalogs offer only simple fielded searching and straightforward keyword indexes doesn't mean all retrieval systems do the same. Mann ignores the possibility of a layer between the user's query and the word-by-word index. He states, "having only keyword access to content is that it cannot solve the problems of synonyms, variant phrases, and different languages being used for the same subjects." This statement confuses "keyword access" (just looking something up in a full-text index) with a system that uses a keyword index among other things for searching. Google could (and right now, does, with the ~ operator [thanks Pat, for the heads up on this!], and who of us library folk is to say they won't do this by default in Google Print) do synonym expansion on search terms before sending the query to the full-text index. Point is, it's not impossible to do this in a search system. The same idea goes for finding items in other languages - translation before the search is actually executed could be done. Ordering, grouping (yes, grouping!), and presentation of search results in this environment would require some advanced processing, but that's doable too.
Of course, there is a difference between what's possible and what's actually implemented in Google today. Mann's language confuses the two, by stating (incorrectly) what's possible using as evidence what's implemented. What's implemented today is the functionality in the Web search engine, but we shouldn't assume the same functionality will drive Google Print. This article uses rhetoric to stir the librarians up for their cause. But it does us a disservice by making false assumptions and obscuring the facts. There are arguments to be made for why libraries are still essential and relevant today. But rabble-rousing with partial truths isn't the way to make them.
Monday, August 29, 2005
Google Print and Fair Use
Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include —
(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted work.
The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.
Note that whether the copyright owner objects or not is not a factor to be considered when determining fair use. That copyright owner could file a lawsuit, but the fair use claim is evaluated on these four factors only.
So how does Google Print stack up against the four factors?
(1) Purpose and character. Commercial vs. educational is singled out here, and certainly Google's use is commercial. But that's not the only purpose or character allowed to be considered. A lawyer for Google could claim that their service, meeting people's information needs and directing them to a copyright holder when a work meets that information need, is a Good Thing. They could then go on to argue that making money of this is secondary, but lots of folks wouldn't believe that.
(2) Nature of the copyrighted work. This is hard to pin down due to the scope of what's being digitized. Books that have been out of print for 45 years and aren't widely available in the used book marked would evaluate differently according to this criteria than Harry Potter. (Yes, research libraries collect fiction too.)
(3) Amount of the work. Again, tricky. Google is digitizing (copying) the entire work, and, presumably, using the entire work to create their index. The counter-argument seems to be they're only showing a small part to users of their service, but I don't believe that applies here. The exclusive right is the copying part, not what you show to other people.
(4) Effect on the market. Here is where only showing snippets to end-users comes in to play. Certainly the effect on the market is potentially severe if one could download, print, read a whole book from Google instead of purchasing it. The recording industry feels that way about file sharing, but there are many who disagree, claiming file sharing actually stimulates purchasing. (Sorry no citations right now, but there are gobs of studies out there on both sides of this issue.) I imagine Google would claim that by showing snippets they're telling users about resources they didn't know about before, and are thus adding to the market. This will be an interesting argument to follow.
My conclusion is that the fair use claim is far from a slam dunk in either direction. Personally, I'd love to see this litigated (and found in favor of Google!) to start what I consider to be much-needed reform in copyright law.
IANAL. Any misinterpretations or flawed analyses are entirely mine, and the result of me trying to pretend I know something about this stuff.
Sunday, August 28, 2005
Musings on the state of coyright
The recent brou-ha-ha (wow, I think that’s the first time I’ve ever written that word down!) over Google Print has me thinking about copyright law. I am not a lawyer. I have no legal training or education. I have picked up a bit about copyright law while working in the area of digital libraries for the past five years, however. I think what I think I know is accurate, but hey, I'm wrong a reasonable amount of the time.
Thursday, August 11, 2005
A billion and one, a billion and two...
The union catalog has transformed the way libraries provide access to their material. A billion holdings in one database seems to me to be proof positive of that. But OCLC Research staff and many others, researchers and practicioners, aren't content with the functionality our current union catalogs offer. The enormous wealth of data represented by those one billion holdings has the potential to be used in innumerable ways. I believe OCLC's FRBR activities are excellent examples of the sorts of creative things we can do with this data to better serve our users. We've made huge strides in access to materials, yet we have many miles to go.
UPDATE: I've discovered today the misfortune of having a book on The Monkees be Worldcat's one billionth holding. We're going to have a country of librarians walking around for two weeks now with that damn theme song stuck in our heads!
Wednesday, August 10, 2005
Keeping up with technology
I follow a number of library- and technology-related blogs. Many of them hype a certain technology that is meaningful to the blogger for their particular needs. I learn a huge amount from these bloggers, the information they provide, and the fervor with which they provide it. But rarely do I go out and try any of the technologies being described just to see what they are. A few peak my curiosity and I go check them out, but for the majority I just mentally file the information away for when I have a problem the technology in question solves. There's just too much going on in this environment right now to really delve in and learn everything new that comes along. Each of us picks up on the emerging technologies most relevant to us in our personal or professional lives. Other technologies are only relevant to us at a later time, but hearing about them before we need them reminds us of the vast range of possibility out there. Sharing our experiences helps others both to adopt them right away when appropriate, but also to adopt them later as the need grows.
Tuesday, August 02, 2005
To each their own "metadata"
Everyone has their talents and areas of difficulty. We're all really good at some things and equally bad at others. Me, I'm completely spatially inept. It once took me 3 hours to put together a futon frame (with instructions). I'm fine with that, because I know my talents lie elsewhere, although I do often think it would be nice to be handy. Despite my lack of innate talent in some areas, I've never thought I simply can't learn any of it. Little by little I'll learn to fix things around the house. I'll never be able to paint with any level of inspiration, but with a whole lot of practice I might be able to use color effectively or produce a still life that is recognizable. One might think metadata is uninteresting. That's cool. I find a lot of stuff out there uninteresting. But don't think it's unlearnable.
Part of the problem here is that "metadata" isn't a monolithic concept. Depending on one's perspective, it can mean virtually anything. To lots of people, all they need is descriptive metadata, and maybe even some version of qualified Dublin Core their content management solution provides them. GIS specialists delve deeply into an area of metadata many know very little about. For many, text encoding is the metadata world, of extremely rich depth and subtlety. I had an interesting conversation recently with a colleague about the definition of "structural" metadata. By some definition, TEI markup is structural metadata, indicating the stucture of the text by surrounding that text with tags. Does that same logic apply to music encoding? Music markup languages specify the musical features themselves, rather than "surrounding" them with metadata. But certainly there's some similarity to text markup. The boundary between structural metadata and markup isn't the same to everyone. Similarly, there are times when I use the word metadata to refer to something that might more accurately be "data," and when I use it to refer to something that might be "meta-metadata."
All of these views are valid. I'm constantly reminding myself of this. Often when my first reaction is that someone doesn't get it, it's really their view not quite meshing with mine. It's important that we have some common terminology and meanings, but I believe there's room for perspective as well. I can get better at my job if I listen more closely to these perspectives.
Thursday, July 28, 2005
Music subject headings
One of those many floating thoughts has been subject headings for music. Many traditional schemes, like LCSH, make a distinction between headings used for works about music, and headings used for music itself. For example, "Symphonies" is used for music scores and recordings of symphonies. But "Symphony" is assigned to texts about symphonies.
Obviously at first glance the distinction between the two forms is subtle. Even if a user realized the potential for this distinction being made (!), it would be difficult for that user to determine which form to use in which case. In my library catalog, a subject browse on "symphonies" lists first an entry for 5407 matches, then second, "see related headings for: symphonies." Clicking on the latter yields a screen saying "Search topics related to the subject SYMPHONIES," but no way to actually do that. This is probably because the authority record for symphonies has no 550s specifying any related headings. Geez. Both because the system shows this anyways and because there are no related headings. [Yet another NOTE: the mechanism for specifying a heading is broader or narrower than another heading in the MARC authority format is ridiculously complicated. No wonder the relationships between LCSH headings are so poor.] This same screen is also where one would view the scope note for the heading "symphonies":
Here are entered symphonies for orchestra. Symphonies for other mediums of performance are entered under this heading followed by the medium, e.g. Symphonies (Band); Symphonies (Chamber orchestra). Works about the symphony are entered under Symphony.
OK. So to find out if "symphonies" is what I'm looking for, I need to click "see related headings for: symphonies"? Riiiiight. Sure, my catalog could handle this better. Not many do.
This distinction isn't always so obvious to specialists, either. I've been reading up on the topic for a project and I'm struck by how rarely it's made explicit. A huge majority of writings simply assume they're talking about one, the other, or both, but never say so. Many others indicate they're discussing one or the other but provide examples of both. I myself recently forgot the distinction at a critical juncture. :-)
I'm wondering if this distinction between headings for works about music and works of music is still needed in modern systems. [NOTE: I don't consider any of the MARC catalogs I'm familiar with to be "modern systems"!] We certainly now have mechanisms to make this distinction in ways other than a subject string. Most of me says this is an outdated mechanism. But in a huge library catalog covering both types of materials, the distinction does need to be made in some way. I'm still pondering over exactly which way that should be.
Monday, July 11, 2005
Structure standards and content standards
One place this trend caught my eye recently was in a blog post by Christopher Harris on using LII's RSS feed to generate MARC records, and subsequent comments and posts by several people, including Karen Schneider of LII. Most of the ensuing discussion was about keeping the two data sources in sync, which of course is important to plan for. But I noted a conspicuous absence of content standards in the discussion. MARC records, of course, do not have to adhere to AACR2 practices. In fact, there are millions of non-AACR2 records (mostly created pre-AACR2 and never upgraded for practical reasons) in our catalogs. But today if one is creating a MARC record, it would be prudent to either use AACR2 or have a compelling argument against it. Yet neither of those options appeared in this discussion. Reading between the lines, I suspect the transformation should be reasonably straightforward, but one shouldn't have to read between the lines to know.
I suppose what I'm really saying here is that when talking about these sorts of activities, we need to completely define the problem to be solved before a solution can be determined. And that includes dealing with content standards in addition to structure standards. Explicitly. Knowing which standards (or lack of them) are in use in the source data and which are expected in the target schema. Planning for moving between them. This is an extremely interesting topic, and I personally would love to see more discussion about it.
Oh, and, for the record, I'm with Karen that one would want to be careful about putting lots of records for things like LII content into our MARC catalogs. My vision (imperfectly focused, unfortunately!) is that because the format (and the content standard that is normally used with it) doesn't describe this type of material well, and the systems in which we store and deliver our MARC records don't provide the sort of retrieval we might desire for these materials, our users would be better served by a layer on top of the catalog that also provides retrieval on other information sources better suited to describing these materials. This higher-level system would provide some basic searching but most importantly lead a user down into specific information sources that best meet his needs. We have lots of technologies and bits of applications that might be used for this purpose. I wonder what will emerge.
Wednesday, July 06, 2005
So what's up with RDF?
And all of this banter reminds me I need to learn RelaxNG and finally figure out what the deal is with topic maps. Anybody have a few extra hours in their day they're willing to send my way? :-)
Tuesday, July 05, 2005
Addition of dates to existing name headings
100 1# $a Bernstein, Leonard, $d 1918-
This heading then was not changed when Bernstein died in 1990. The CPSO proposal notes that libraries, including LC, receive frequent comments and complaints from users regarding the "out of date" nature of headings of this sort.
In discussion of this policy on the AUTOCAT listserv, the question arose as to whether name authority files served to simply generate unique headings for an person, or if they served a wider biographical function. Certainly historically the former is true. But many, including the CPSO, are recognizing that increasingly we may be well served by delving into the latter. We have an opportunity here to become more useful and relevant to the wider information community. To take that opportunity might seem to be a no-brainer.
However, the current cataloging infrastructure makes the implementation of this change challenging, to say the least. As authority data is replicated in local catalogs and the shared environment, and most integrated library systems store actual heading strings in bibliographic records rather than pointers to authority records, changing a heading would then require notifying all libraries that a change has been made, propagating that change from one library to the rest, then continuting to propagate that change in every local system to all affected bibliographic records. Clearly this mechanism is anachronistic in today's networked world, where relational databases are so entrenched as to be considered almost quaint. I fully understand the practical implications of the CPSO implementing this policy. Yet I believe that it is the right thing to do. We as librarians simply must have a vision for what we're trying to accomplish, and work tirelessly towards that goal. While we must keep the practical considerations in mind, we can't let them dictate all of our other decisions. Let's set the policy to do the right thing, and insist on systems that support our goals.
Tuesday, June 28, 2005
Back from ALA
I spent most of my time at ALA attending presentations I "had" to attend--those related to my daily work. I was able to spend a small amount of time expanding my horizons, but I wish I could have done more. And this schedule is without being involved in any ALA committees that meet during the conference. There is simply too much going on to take advantage of it all.
On another note: on the trip home I started reading, but didn't finish, Martha Yee's recent paper outlining a "MARC 21 Shopping List." I should hold any substantial comment until I finish the article, but so far I'm impressed. The approach of looking very precisely at the criticism of MARC and current cataloging practice to determine what exactly is being criticized, I believe, is long overdue. I do find myself thinking of counter-arguments to some of the conclusions, however. But intelligent discourse is absolutely what we should be striving for!
Thursday, June 23, 2005
Coming out of the woodwork
FRBR is a good example. A colleague of mine recently described FRBR as a "religion," and I think that's not entirely untrue. But I'm increasingly seeing rank-and-file librarians (not just us "digital" folks or special collections librarians who do things "differently" anyways, according to one popular perception) show an interest in it. These folks commonly just want to learn what it is and what it can do for them. They aren't interested in jumping on a bandwagon just to be there. Rather, they genuinely want to evaluate for themselves the value of the model to them and their users. Sure, there are now and will always be extremists on both sides of the issue. I know librarians who want nothing to do with FRBR, and I know others who insist nothing from today's bibliographic control practices will be of any use in five years. But thankfully most of us fall somewhere in the middle.
I see huge numbers of librarians willing to talk about their ideas, even if they represent a departure at some small or vast level from current practice. I see huge numbers of librarians taking analytical approaches to solving real access problems they deal with every day. I see huge numbers of librarians keeping the overall goals of access and preservation of intellectual output foremost in their minds as they look for solutions. I see huge numbers of librarians having lively, interesting, professional discussions about options for achieving these goals. I love my job.
Friday, June 17, 2005
DCMI & Bibliographic Description
The recommendation is described as emerging from the need for describing journal articles in DC. The recommendations tend to center around putting the information that was previously problematic (journal title, volume number, issue number, page range, etc.) within a bibliographicCitation refinement for dc:identifier, while getting the rest of the citation information from other parts of the DC record. "Optionally, but redundantly, these details may be included in the citation as well." This optional part has huge consequences for anyone using DC metadata to get to these citations. One could never know if the complete citation is present in the dc:identifier.bibliographicCitation element, or if one needs to look elsewhere for information to complete the citation. Also, it results in a situation where some of the data needed for this citation is clearly fielded (author in dc:creator, article title in dc:title, etc.), but the rest of it is not. This is hardly an elegant solution to the problem at hand.
Also, "there are no recommendations for providing bibliographic citations in simple Dublin Core." However, it is "suggested" that citation information be put in dc:identifier or dc:description. How is anybody suppposed to use DC for this purpose if the "experts" on it can't bring themselves to turn a "suggestion" into a "recommendation?" This document says to all of us out in metadata-land that there's a solution (actually, TWO solutions - identifier and description - choose between them randomly!), but the powers that be can't or won't formally endorse it, perhaps because it's viewed as a hack. This passive-aggressive "well, we see you have a problem and here are some possible ways to solve it on an official-looking document, but we're not going to tell you that we think any of these solutions are a good idea" crap is really starting to get on my nerves.
I'm also confused about something. bibliographicCitation is a refinement of dc:identifier, and therefore by the DC "dumb-down" rule is a type of identifier. The recommendation says, "dcterms:bibliographicCitation is an element refinement of dc:identifier, recognising that a bibliographic resource is effectively identified by its citation information." But then it goes on to say, "In Dublin Core Abstract Model terms the value of the dcterms:bibliographicCitation property is a value string, encoded according to a KEV ContextObject encoding scheme. It is not intended to be the resource identifier, which for a journal article would probably use an appropriate URI scheme such as DOI." So which is it? Is bibliographicCitation an identifier or not? Is the second quote using "identifier" to mean something different than dc:identifier without telling us? I'm willing to assume for now what I see as a contradiction here comes from my purely surface-level understanding of the DC Abstract model. But maybe not...
Monday, June 13, 2005
A gulf between research and practice
Nov. 2004, PhD dissertation, Marcos André Gonçalves, "Streams, Structures, Spaces, Scenarios, and Societies (5S): A Formal Digital Library Framework and Its Applications", http://scholar.lib.vt.edu/theses/available/etd-12052004-135923/
... as it related to the topic at hand. The presenter hadn't heard of it, and neither had I. But why hadn't I heard of it?!? This sort of work should absolutely be on any digital library practicioner's reading list, and any researcher in this area, be it computer science (as this one was) or LIS, should have some familiarity and ongoing discourse with practicioners. Both pure research and pure implementations of digital libraries are necessary, but that doesn't mean there is no middle ground, or that the two can't engage each other in a meaningful way. My work will be better for having read this research, and research will be better for having learned about what departments like mine produce.
I think one reason for this gulf is the differing definition of "library" held by different folks. But that's a post for another day.
Wednesday, June 01, 2005
Beyond silly...
A post on Autocat last Friday asked about what to record in MARC 007 as the playing speed of a CD. The answer:
"Compact digital discs: Speed is measured in meters per second. This
is the distance covered on the disc's surface per second, and not the
number of revolutions.
f 1.4 m. per sec."
WHY, exactly, is this information important to be included in a MARC record? CDs and DVDs only play at one speed. I know that for analog discs (records, remember those?), one needs to know, for example, if it's a 45 or a 33 1/3, but not for the media currently under discussion! (And LP speeds are what they're *supposed* to be, not what they really should be to reproduce at pitch!) It strikes me very strongly as an anachronism, completely unnecessary in a bibliographic record for a CD created in 2005.
The conversation on Autocat then spun into a discussion of why it's not measured in revolutions per second, some technical details about how CD players work, etc. Interesting, certainly. But I'm a bit incredulous that the focus is on the method of measurement rather than the point of including that data in the first place! If and when CD players are historical artifacts, and all information on how they worked is lost, looking in MARC records and interpreting the very complex semantics of 007 is not going to be the revelation reconstructing the speed at which they should play. Even if we should be recording this information for posterity (value for dollar, anyone?), it doesn't have to be in every single bib record for a CD! We record this information at the expense of far more important data, such as analytics for individual musical works on the recording. Please, please, please! Let's step back and think about why we create these records in the first place. AACR3 (oops, RDA!) is trying to do this, but I fear it's not going nearly far enough.
Rant over. I do realize there are lots of practical problems we have with legacy data if we're going to make large-scale practice to cataloging changes. Let's work to solve those problems and not let them scare us off from doing anything. There are lots and lots of folks out there doing just this stepping back I'm pleading for. Good work, all of you! Let's do some more.
UPDATE! I get AUTOCAT in digest mode, and wrote the above based on messages received up to the morning of 6/1. In the digest I received 6/2, there are no less than TWO posters wondering what the heck this stuff is doing in a MARC record anyways. There's also continued endless discussion about linear velocity, how the CD measurements relate to tape media, how they relate to the "48x" speed advertised for CD-ROMs, etc. It's great that folks want to really understand these things, but I'd still argue that preferencing this sort of information over lots of other useful information isn't the right thing to do.
Wednesday, May 25, 2005
Z39.19 Revision
8.3.3.2 Parts of Multiple Wholes
When a whole-part relationship is not exclusive to a pair
wholes, the name of the whole and its part(s) should not
they should be linked associatively rather than hierarchically
Carburetors, for example, are parts of machines other
relationship in this instance is cars RT carburetors.
I'm disappointed in this decision. In order to preserve a pure hierarchy (something cannot be a part of multiple wholes), some semantics are lost. The idea that a carburetor is a part of a car (as well as potentially a part of lots of other stuff) is lost by relegating it to an RT (associative relationship). Whole-part relationships appear in the document as one of three types of hierarchical relationships; therefore, it seems that by categorizing them here the authors were forced to make the decision to move a huge number of things commonly thought of as having a whole-part relationship to an associative relationship. We librarians just love hierarchy, don't we. Too bad the world is polyhierarchical. Looks like our information systems won't be able to catch up yet.
Monday, May 23, 2005
"In Search of the Single Search Box"
I heard an excellent presentation recently at the Digital Library Federation Spring Forum, which has been referenced recently on a library mailing list (WEB4LIB?). Staff at NC State have developed some methods for a single search box on the library's web site actually providing relevant information for all the many types of queries users type in that box no matter how much explanatory information indicating what resources that box searches is present on the page. The presentation was titled "In Search of the Single Search Box: Building a 'First Step' Library Search Tool." (Firefox users beware: the presentation is in HTML-ized Powerpoint and will look really strange in your browser!) Their video demo does an excellent job of illustrating the types of information needs to which the tool can respond. As the presentation suggests, this box doesn't search inside absolutely everything, but is intended to be a first step from which users can see some ideas and choose among them for continuing their journey.
As I recall (this is what I get for waiting this long to post on the topic...), the tool presents results in four major categories:
1) FAQ for the libraries
2) Library web pages
3) Links to perform the same search in some databases (the catalog, Academic Search Premier, list of journal titles, etc.)
4) Related subject categories
The FAQs meet needs where somebody wants to know the library hours or where the closest computer lab is. The library web pages results are Google-driven, so a page excerpt appears that a user might find helpful in selecting a result when they want some contextual information about a resource. The "search the collection" links make catalog or database search results an extra click away if that was the desired search, but that click is simply moved from the beginning of the process (click a catalog link on the home page, or, alternatively, take a few minutes to figure out which box on the front page to type in!) to this stage.
The "Browse Subjects" area, where a list of potentially relevant subjects is displayed, peaks my interest most about this project. The presentation didn't have a ton of information about where these links go and how the logic to develop them is created, and unfortunately I didn't have a chance to ask the NC State folks in person more about it. But from the presentation and the demo video, it looks like these links go to pathfinder-style pages where "selected" resources (selected how and by who would presumably be a local implementation decision) are displayed or linked. The presentation slides state that journal article titles and course descriptions are currently used to provide the connections between search terms and the pre-defined subjects. That's a great place to start! One can imagine a host of other options, including subject authority files, those same library web pages indexed elsewhere, and periodic looks at search logs for this box. Oh, and I see now one of the final slides in the presentation talks about some other sources - I'd forgotten that! I find the huge amount of potential here very exciting.
This tool isn't currently deployed on the NC State Libraries Web site, but I hope to see it soon. I don't recall if they plan to release any of their source code, but it sure would be nice if this was possible. I'll be keeping an eye on developments in this area.
Oh, and by the way. Never. Moving. Again. :-)
Saturday, May 07, 2005
Cataloging sound recordings
I can't help but feel that, like a great many discussions of this sort, the participants are talking past each other. One point that has been mentioned but perhaps not strongly enough, is that the user experience problems with library cataloging is heavily a problem of the use the search system makes of the data and how it's presented to end-users. Ralph Papakhian, one of the premier music catalogers in the country, who I like and respect a great deal, has made the point in this thread that the data elements some respondents mention as wanting to record are in fact recordable in MARC. And if anyone would know and can explain this to others, it's Ralph. But these elements, even though they're there, are often not accessible to users. For example, MARC has fields for date of composition and coded instrumentation of a recording or score. But few if any library systems index or display this data. So catalogers rarely enter them, which provides less incentive for systems to use them, which provides less incentive for catalogers to use them, which provides less incentive for systems to use them...
But I believe systems aren't the only problem. There are lots of little things I think MARC/AACR2 could do better. However, the biggest, and mostly implicit in this discussion, difference in what MARC does and what some of the other participants in this thread look for in sound recording cataloging, is the library focus on the carrier over the content. Catalogers discuss this issue frequently, but it hasn't been brought up explicitly in this thread. Audiophiles absolutely are interested in the recording as a whole--its matrix number, sound engineers, etc. But they are also equally interested in the musical works on the recording, what personnel are connected with which piece, timings of tracks, etc. MARC has places for these things, but they are relegated to second-class status. Catalogers know and tout the benefits of structure and authority control in information retrieval. But when it comes to the contents of a bibliographic item, we apply none of these principles in the MARC environment. Contents notes are largely unstructured (and what structure is possible is rarely used and keeps changing!), don't make use of name or title authority control, and in many cases aren't indexed in library systems.
As pointed out in this thread, creating this content-level information is extremely expensive. But the networked world has the potential to change that. Much of this information has been created in structured form outside of the library environment, by record companies, retailers, and enthusiasts, but we don't make use of it. Right now, it's difficult to make use of it because our systems don't know how to talk to each other. It will take a great many baby steps, but I hope we can start down the road towards changing that.
Matt Snyder of NYPL, who I met at MLA this year and was extrememly impressed with, has made the point in this thread that MARC records (and, by extension, library catalogs) and discographies have different purposes. This is definitely true in today's environment. Library catalogs are primarily for locating things, and discographies have more of a research bent. But I feel strongly, and this email discussion seems to support this view, that the distinction is largely artificial and is becoming less relevant as information retrieval systems continue to evolve. More sharing of data between systems will hopefully result in fewer systems to consult by end-users. That's certainly my goal!
Thursday, May 05, 2005
Known-item vs. unknown-item searching
But look more closely at c) in that first objective - we should provide access to an item when the subject of it is known. So what exactly does that mean? Most current systems in a library environment fulfil that by making text in a subject-ish field keyword searchable. When I do a subject search in a system of that sort, I get back records that have subjects containing the word I typed in. But how do users know what the words in those subjects are? Some (certainly not all!) systems provide the user a way to look at a list of subjects used in that system. The user then is expected to locate all subjects of interest in that list, then construct a properly-formulated Boolean query OR-ing those subjects together. I'll be perfectly frank and state that I believe strongly that this is silly to expect of any user in this day and age, even an "expert" user such as a reference librarian. Let's use the computing power we have!
And what about these of Cutter's objectives?
2. To show what the library has
- e. On a given and related subjects
- f. In a given kind of literature
What we need are systems that do an exponentially better job of starting out from an interesting thing and finding more things like it. I personally think postcoordinated subject headings would be a major advance in this area, but they're certainly not enough. Systems that map lead-in terms to authorized terms, and expand search results to include narrower terms than a matched broader term are also necessary. One can also imagine other mechanisms to build that "like" relationship, based on information retrieval research, folksonomies, and transaction logs.
I suppose my point in the end is that it's simple to build a system that searches the text of pre-created metadata fields for an entered query string. It's much more difficult to build systems that allow users to truly explore. We often forget how important that exploration function is. We look at our search logs, and see mostly known-item searches, so we think that's what we need to focus on. Of course we see that - it's what our systems are designed around! But what would happen if we started to provide relevant results to subject and other unknown-item searches? I'd bet a whole lot of money that we'd see a huge increase in unknown-item searching. Sure, for some types of materials, known-item searching may very well be the primary means of access users need. But let's at least look at the alternative, and work with actual users to see how we can provide them with exploratory functions we don't currently supply.
Tuesday, May 03, 2005
FRBR Workshop
The workshop itself is a mixture of presentations on specific topics and time to just talk. Some presentations don't at first glance look to be FRBR related, but every single one really does have a definite impact on how FRBR should develop in the future, either as a conceptual model or as some sort of implementation model based on the conceptual one. Some presentation slides are on the workshop site now, and hopefully all will eventually be. But the presentation slides in no way do the actual presentations and the resulting large- and small-group discussions justice. I feel more confident than at most meetings of this type that the discussions will have real results, in the form of writings and implementations. I sincerely hope so - many people out there are interested in this topic, and the best thing we can do now is share, share, and share some more.
Thursday, April 28, 2005
Newsweek article on "tagging"
I'm enough of a skeptic to think it's not practical for libraries to switch wholesale to folksonomy-type endeavors for subject access, but surely there are ways in which we can capitalize on the wealth of relevant information being generated out there. I've been interested for some time in incorporating user-contributed to a project I work on. My plugs for this to date have used Wikis as examples - I think I'm going to have to add folksonomies to my spiel!
Tuesday, April 26, 2005
AMeGA Automatic Metadata Generation final report
Of particular interest to me is Section 8, where proposed functionalities are listed for metadata generation applications. There are a number of very good suggestions here, often focusing on streamlining the metadata generation proceess - making use of automation when current technologies perform well, and making the human-generated part of the process easier. I definitely agree with the report that there is a huge disconnect today between research in this area and production systems. There is very interesting research in this area going on, but production systems don't yet make good use of it. Right now, we still need humans in the process. I'm not opposed on principle to changing this, but that's today's reality.
The report characterizes survey respondents as "optimists' and "skeptics," based on their projections of future abilities to automate metadata creation. The report quotes several skeptics as proclaiming it simply not, under any circumstances, to completely automate metadata creation. I'd like to think of myself on the fence with regard to this issue. I don't like to say "never" but I do see that generation of certain types of metadata elements will be easier to automate than others. The more we can automate, great. I also understand the problem with evaluating automatic metadata generation applications. Few people agree on approprate subject headings, etc., so how do we know if a generated heading is appropriate? In my opinion, the more we can expose people to the results of generated metadata, the better we can evaluate it, and the better these systems will eventually get.
Wednesday, April 20, 2005
ANSI/NISO Z39.19 draft revision
ANSI/NISO has released a draft revision of Z39.19, now titled "Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies." I haven't had a chance to read the document yet, but it sure looks interesting! From the table of contents, I'm glad to see a small section on synonym rings, as we encountered these not working the way we expected in an implementation of OracleText. At first glance, the scope of the standard seems to have expanded. There are sub-sections of the "principles" section on ambiguity and facet analysis that I don't recall being in the existing standard (but don't quote me on that!). I'm extremely interested in the section on displaying controlled vocabularies. In my opinion this is the biggest barrier to end users of systems using controlled vocabularies today - displays that completely separate the vocabulary from the search interface, requiring users to know of their existence, understand their structure, and take the time to consult them! I look forward to seeing if this draft standard can make them more understandable.
Sunday, April 10, 2005
"Authority control in AACR3"
"The definition that is likely to be included in AACR3 is: 'the means by which entries for a specific entity are collocated under a single, unique authorized form of a heading; access is provided to that authorized form from variant forms; and relationships between entities are expressed.'"
Authority control for names certainly fulfils the collocating function described here, and, conversely, a disambiguation function by creating different headings for different people with similar or identical names. But in today's information systems it can and should fulfil another function - helping users to decide if the name heading displayed to them is for the individual they're interested in. But I believe only the first goal is served by a system where the uniqueness of a person is represented only by the form of the heading. Name authority files also don't completely disambiguate names; there are many cases of duplicate names in the authority file when no information other than what appears on a publication is available to the cataloger.
I can't help but wonder if we're missing an opportunity here to move to a structure that can more easily fulfil both goals. Information that would help a user decide if a person is the one they're interested in is frequently added to a name heading, but not always. If all of that information, plus any more that may be of use, is made available to the user in a flexible manner, rather than just the data necessary to disambiguate one name from another, the second goal would be much more easily served. Perhaps this is not the time for this sort of change to be made. I do think we as librarians and system designers should be open to changes of this sort, continuing to focus primarily on the task we want to accomplish, and leaving the mechanics of accomplishing that goal as a later step.
Saturday, April 09, 2005
Friday, April 01, 2005
April Fools!
Sunday, March 27, 2005
Random thoughts on XOBIS
The most obvious difference I see between XOBIS and FRBR is that XOBIS attempts to be a model that can describe all of knowledge, while FRBR limits itself to modelling bibliographic relationships. In a practical sense, for recording bibliographic data (and this certainly isn't the only possible use of XOBIS!) this means that XOBIS explicitly handles entities that represent in a bibliographic environment creators or subjects of bibliographic items (and, in FRBR, other Group 1 entities), currently residing in a relatively unstructured way in name and subject authority files. FRBR, on the other hand considers only briefly its Group 2 ("person" and "corporate body") and Group 3 ("concept," "object," "event," and "place") entities, focusing instead on Group 1 entities.
Relationships between entities is a key feature of XOBIS; they are also a bit confusing to me on my first read. My initial impression is that the relationships as specified focus more on subject-type relationships rather than relationships among bibliographic items. My reading is that the XOBIS definition of work is much closer to what we currently consider a bibliographic item than FRBR's work. The discussion and examples in the overview document talk about versions of works and how they are related, but I saw much less about the "accidental" sort of relationship a FRBR-ish work (as its expressed in a specific manifestation) would have to another expressed work on the same manifestation, for example, two symphonies appearing on the same CD. It would be an interesting excercise to map out how the XOBIS model would handle this sort of situation, where the symphony itself is the entity of primary interest to a majority of end-users rather than the specific performance or the title of the CD on which it appears.
XOBIS comes out of the Medlane project of the Lane Medical Library at Stanford. I wonder what effect medical materials have had on the development of the XOBIS model. I know my focus on musical materials in various projects, most notably Variations2, certainly strongly affects my thinking about FRBR and related efforts. I'm sure that's obvious from my earlier question wondering how XOBIS would handle a situation that the Variations2 model is designed around.
There are also some very interesting items in the report's bibliography, including a project mailing list (renamed since the version listed here, and looks low-traffic). Time for citation chasing!
Wednesday, March 23, 2005
Postcoordinated subject headings
I was struck in the discussion by the widespread lack of big-picture thinking about the issue, and the corresponding lack of awareness of the many initiatives going on in this area. There were certainly some members contributing to the discussion who have spent some time thinking about this issue, but many seemed afraid of the idea. I got the sense that many folks were trained on LCSH, that's what they use, and why in the world would they want to use anything else? When posts mentioned specific postcoordinated schemes (FAST, AAT, etc.) they tended to be mentioned as something the person had heard of but never used and didn't fully understand. I'm generalizing a bit here, but that tone was definitely present.
I don't know that I have anything concrete to say other than that I've noticed a trend of resistance to non-LCSH subject systems, but I do think that as catalogers are increasingly being asked to be metadata experts (and by that I mean metadata in a broad sense, not just traditional cataloging practice!) they'll more and more need to know about what vocabularies are out there. A huge part of my job as a Metadata Librarian is choosing among the various data structure and data content standards available for a given implementation. We're definitely past the days when one size (MARC/AACR/LCSH) fits all. The more all sorts of librarians learn about alternatives and can make good decisions about when they're appropriate to use, the better off our whole profession will be.
Sunday, March 20, 2005
"We're not competing with Google"
I didn't respond to it at the time, but the statement has been churning around in my head ever since. Whether or not it's true depends, of course, on what one means by "competing." If we mean, "attempting to do exactly the same thing," then that's pretty much true. While we're both in the information business, the way in which we approach it is fundamentally different. And that's OK. But if we mean "fighting for the attention of users" or "fighting for the perception that we provide valuable services worth funding," then maybe we are competing with Google. The differences between libraries' missions and the way in which we go about achieving them is important to us, but perhaps it's too subtle for a large proportion of the population. Certainly there are lots of folks out there that think Google can and will replace libraries, even if we think they're wrong.
So what does this mean? Well, I think it means that libraries need to continue to promote what we do and why. Not in the preachy Michael Gorman style proclaiming from on high to the masses that libraries are the cornerstone of high civilization and those who disagree aren't worth thinking about, but rather by building and delivering services that meet our users' needs. In the rapidly changing information environment, this means we do need to be rethinking how we do a lot of what we do. Let's remember our core principles of preservation, collocation, and free access, and find new ways to implement these in today's environment and for today's diverse users.
Wednesday, March 16, 2005
A DC frustration
I've dealt with this exact situation before, I guess I was blocking it out because it's SO annoying. Some folks would put this information in <dc:contributor>, and in fact several of my OAI sets do just this in their DC records. I suppose that might be OK, but the DC Contributor definition is "An entity responsible for making contributions to the content of the resource" and I don't know if I'm so comfortable calling "paying somebody to digitize this stuff and then asking another department to 'put it up on the Web'" "making contributions to the content of the resource." Some folks would put this information in <dc:publisher>, but again I'm skeptical. "An entity responsible for making the resource available" (DC Publisher definition) does apply to the digital resource. However, we're dealing with published materials here whose publisher for the print item can be an important access point. And we don't (nor does pretty much anybody) have a sophisticated mechanism in place for making good 1:1 principle records and linking them all together in a way that allows users to search on things meaningful to them and get meaningful results back. Putting our holding institution in Publisher in this environment would not serve our users' needs.
I started out using a hack I'd used before: put the holding info at the beginning of a <dc:source> field and add to the end the local call number so it fits the Source definition. But then I got annoyed at using what I consider a hack. So I started digging around. The Western States Dublin Core Metadata Best Practices made up their own element (currently called "Contributing Institution") and don't map it to DC. This is one of the very few elements they go completely out of DC for. The DC Libraries Working Group made a proposal in 2002 for a new DC element called holdingLocation, but by the time this proposal was reviewed by the Usage Board, MODS had gotten off the ground, so, the UB decision said to use the MODS <location> element instead.
So the DC solution to this problem is to use an Application Profile that borrows an element from another schema. But once you start doing this, the draw of DC (simplicity!) is lost. I'm probably just going to use MODS instead. Sigh.
Monday, March 14, 2005
Defining "librarian"
Certainly the definition of "librarian" is contextual. The American Libraries article asks some library paraprofessionals what their answer is to the question "Are you the librarian?" Since a patron asking that question almost certainly means "Can you help me?" rather than "Do you have an MLS?" or "Does your job title say you're a librarian?" so the answer there in my opinion should be an emphatic YES.
But many librarians are extremely protective of this label. It represents a significant investment of time, money, and intellect into earning a professional degree. And that's certainly nothing to sneeze at. (Even if some MLS programs in this country today can't reasonably be described as "rigorous.") However, I certainly know a number of people in jobs with titles including "librarian" who were hired under the rationale "MLS or equivalent experience" who do excellent jobs. Shouldn't one's ability to perform the duties of a position be the primary criterion for hiring them? I tend to think that a piece of paper bearing the designation MLS doesn't necessarily tell an employer whether or not an applicant is qualified.
I guess the argument comes down to whether the term librarian should refer to "what you do" or "who you are." And I can see how each would be appropriate in different circumstances. I tend to believe one should demonstrate his or her skill and professionalism in their interactions with people and in their work performance, rather than assuming an acronym and a diploma are an accurate indication.
Saturday, March 12, 2005
"Google at the Gate"
In this article, Gorman continues the dismissive style of rhetoric that have incensed so many in his previous comments on the Google projcet and on blogging. The tone is very much one of a person who is certain he is right and need not consider any other arguments put to him. Two quotes in particular caught my eye:
"Any user of Google knows it is pathetic as an information retrieval system..."
This quote, of course, depends heavily on the definition of "information retrieval system." The remainder of the sentence references the traditional IR research metrics of recall and precision, so it's probably reasonable to assume that Gorman is measuring the effectiveness of Google along those lines. And that's one fair way to measure. However, your random Google user is probably unlikely to measure Google according to those terms. Most information needs are for something on a topic rather than everything on a topic. We in libraries are used to (and should be!) focusing on the latter. But that doesn't mean it's the only way to design a search engine.
"I cannot see the threat to small libraries [from the Google digitization project], nor can I see much of an advantage."
Gorman's answer to this question stands in stark contrast to those of the participants in the interview. The others give multi-sentence responses, addressing at least some possiblities for advantages and disadvantages to small libraries from the Google digitization project. But the style of Gorman's answer is, again, dismissive, giving the impression he's made up his mind that the Google project is "bad" and that there is no need to consider its impact on libraries, small or otherwise. Perhaps he's carefully thought through all the issues and this quote is the result of a great deal of reflection. But there's no explanation presented, so the reader cannot know. I suspect this style of rhetoric, passing down from on high a conclusion without any explanation or support, will not prove effective for libraries as we increasingly need to talk about our services and expertise to those outside the profession.