I've just finished reading (reading, what's that? haven't done it in a while...) the final report from the AMeGA (Automatic Metadata Generation Applications) Project. I filled out the survey on which part of this report was based, and I have to admit, I wasn't optimistic about the project. The survey referenced that it was meant for text objects primarily, but as someone who works heavily in non-text environments, I found this disappointing. But now that it's out, overall I think the report has done a good job outlining the issues involved.
Of particular interest to me is Section 8, where proposed functionalities are listed for metadata generation applications. There are a number of very good suggestions here, often focusing on streamlining the metadata generation proceess - making use of automation when current technologies perform well, and making the human-generated part of the process easier. I definitely agree with the report that there is a huge disconnect today between research in this area and production systems. There is very interesting research in this area going on, but production systems don't yet make good use of it. Right now, we still need humans in the process. I'm not opposed on principle to changing this, but that's today's reality.
The report characterizes survey respondents as "optimists' and "skeptics," based on their projections of future abilities to automate metadata creation. The report quotes several skeptics as proclaiming it simply not, under any circumstances, to completely automate metadata creation. I'd like to think of myself on the fence with regard to this issue. I don't like to say "never" but I do see that generation of certain types of metadata elements will be easier to automate than others. The more we can automate, great. I also understand the problem with evaluating automatic metadata generation applications. Few people agree on approprate subject headings, etc., so how do we know if a generated heading is appropriate? In my opinion, the more we can expose people to the results of generated metadata, the better we can evaluate it, and the better these systems will eventually get.