Sunday, September 18, 2005

The next big thing in searching?

At a conference last week, I heard Stephen Robertson of Microsoft Research Cambridge speak about the primacy of text in information retrieval, whether for text, images, or any other type of medium. He made a statement in the talk that the first generation of information retrieval systems operated on Boolean principles, and the second generation (our current systems) provide relevance-ranked lists. This may be a truism in the IR world, but it's something I hadn't thought about in these terms before. Our library sytems certainly are primitive in terms of searching, and they operate on the Boolean model. But I hadn't thought of relevance ranking as the "next step" - probably because the control freak in me is suspicious of a definition of "relevance" not my own. But I think it's fine to look at the progression of IR systems in this way.

So what's the third generation? Where are we going next? I think the next step is grouping in search results. Grouping is where I see the power of Google-like search systems merging with library priorities like vocabulary control. Imagine systems that allow the user to explore (and refine) a result set by a specific meaning of a search term that has multiple meanings, by format, or by any number of other features meaningful to that user for that query at that time. I picture highly adaptive systems far more interactive than those we see today. Options for search refinement alone, I don't believe, go far enough, as they require the user to deduce patterns in the result set. I believe systems should explicitly tell users about some of those patterns and use them to present the result set in a more meaningful way. Search engines like Clusty are starting to incorporate some of these ideas. It remains to be seen if they catch on.

FRBR assumes this sort of grouping can be provided, using the different levels of group 1 entities. Discussions of FRBR displays frequently talk about presenting Expressions with a language for textual items, with a director for film, or with a performer for music, allowing users to select the Expression most useful to them before viewing Manifestations. What's missing is how the system knows what bits of information would be relevant for distinguishing between Expressions, since these bits of information will be different for different types of materials, and sometimes even with similar types of materials. We have a ways to go before the type of system I'm imagining reaches maturity.

No comments: