Thursday, April 28, 2005

Newsweek article on "tagging"

In the April 18, 2005, issue of Newsweek, "The Technologist" column is about "tagging," sites like Flickr and, that collect users labels for things. These "things" can be absolutely anything - the great thing about the Internet is that communities can appear almost instantaneously around anything at all. These labels can then be used to generate a folksonomy. There's a lot of buzz out there about folksonomies right now on the Web, and it's well-deserved. It's cool stuff. It provides such a great sense of how REAL PEOPLE (who?) think about things.

I'm enough of a skeptic to think it's not practical for libraries to switch wholesale to folksonomy-type endeavors for subject access, but surely there are ways in which we can capitalize on the wealth of relevant information being generated out there. I've been interested for some time in incorporating user-contributed to a project I work on. My plugs for this to date have used Wikis as examples - I think I'm going to have to add folksonomies to my spiel!

Tuesday, April 26, 2005

AMeGA Automatic Metadata Generation final report

I've just finished reading (reading, what's that? haven't done it in a while...) the final report from the AMeGA (Automatic Metadata Generation Applications) Project. I filled out the survey on which part of this report was based, and I have to admit, I wasn't optimistic about the project. The survey referenced that it was meant for text objects primarily, but as someone who works heavily in non-text environments, I found this disappointing. But now that it's out, overall I think the report has done a good job outlining the issues involved.

Of particular interest to me is Section 8, where proposed functionalities are listed for metadata generation applications. There are a number of very good suggestions here, often focusing on streamlining the metadata generation proceess - making use of automation when current technologies perform well, and making the human-generated part of the process easier. I definitely agree with the report that there is a huge disconnect today between research in this area and production systems. There is very interesting research in this area going on, but production systems don't yet make good use of it. Right now, we still need humans in the process. I'm not opposed on principle to changing this, but that's today's reality.

The report characterizes survey respondents as "optimists' and "skeptics," based on their projections of future abilities to automate metadata creation. The report quotes several skeptics as proclaiming it simply not, under any circumstances, to completely automate metadata creation. I'd like to think of myself on the fence with regard to this issue. I don't like to say "never" but I do see that generation of certain types of metadata elements will be easier to automate than others. The more we can automate, great. I also understand the problem with evaluating automatic metadata generation applications. Few people agree on approprate subject headings, etc., so how do we know if a generated heading is appropriate? In my opinion, the more we can expose people to the results of generated metadata, the better we can evaluate it, and the better these systems will eventually get.

Wednesday, April 20, 2005

ANSI/NISO Z39.19 draft revision

Whew! I'm finally back from 3 trips in 3 weeks, and have slogged through enough email to think about the blog again. I had lots of interesting developments waiting for me when I returned - new blog fodder!

ANSI/NISO has released a draft revision of Z39.19, now titled "Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies." I haven't had a chance to read the document yet, but it sure looks interesting! From the table of contents, I'm glad to see a small section on synonym rings, as we encountered these not working the way we expected in an implementation of OracleText. At first glance, the scope of the standard seems to have expanded. There are sub-sections of the "principles" section on ambiguity and facet analysis that I don't recall being in the existing standard (but don't quote me on that!). I'm extremely interested in the section on displaying controlled vocabularies. In my opinion this is the biggest barrier to end users of systems using controlled vocabularies today - displays that completely separate the vocabulary from the search interface, requiring users to know of their existence, understand their structure, and take the time to consult them! I look forward to seeing if this draft standard can make them more understandable.

Sunday, April 10, 2005

"Authority control in AACR3"

I recently read a paper by Kierdre Kiorgaard and Ann Huthwaite that I heard about on Catalogablog, entitled "Authority control in AACR3." The paper describes the efforts underway to address the issue of authority control explicitly in AACR3, in a manner more explicit than in AACR2. The statement in this paper I find most interesting is this:

"The definition that is likely to be included in AACR3 is: 'the means by which entries for a specific entity are collocated under a single, unique authorized form of a heading; access is provided to that authorized form from variant forms; and relationships between entities are expressed.'"

Authority control for names certainly fulfils the collocating function described here, and, conversely, a disambiguation function by creating different headings for different people with similar or identical names. But in today's information systems it can and should fulfil another function - helping users to decide if the name heading displayed to them is for the individual they're interested in. But I believe only the first goal is served by a system where the uniqueness of a person is represented only by the form of the heading. Name authority files also don't completely disambiguate names; there are many cases of duplicate names in the authority file when no information other than what appears on a publication is available to the cataloger.

I can't help but wonder if we're missing an opportunity here to move to a structure that can more easily fulfil both goals. Information that would help a user decide if a person is the one they're interested in is frequently added to a name heading, but not always. If all of that information, plus any more that may be of use, is made available to the user in a flexible manner, rather than just the data necessary to disambiguate one name from another, the second goal would be much more easily served. Perhaps this is not the time for this sort of change to be made. I do think we as librarians and system designers should be open to changes of this sort, continuing to focus primarily on the task we want to accomplish, and leaving the mechanics of accomplishing that goal as a later step.

Saturday, April 09, 2005

LJ April 1 Retrospective

The Library Journal April 1 issue has been archived.

Friday, April 01, 2005

April Fools!

Be sure to check out the April 1 edition of Library Journal. I hope this gets archived somewhere. What a hoot!