Monday, August 29, 2005

Google Print and Fair Use

Having thought some about the "copying" aspect of Google Print, it would now be prudent to think about exceptions to the exclusive right of copyright holders to reproduce a work. Google's stance seems to be that their activities fall under the scope of the Fair Use exception to copyright. Fair Use is by far a straightforward concept, and comparatively very few cases have served to clarify the issue. Here's the text of section 107 of the copyright act, which describes the fair use exception:

Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include —

(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

(2) the nature of the copyrighted work;

(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

(4) the effect of the use upon the potential market for or value of the copyrighted work.

The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.

Note that whether the copyright owner objects or not is not a factor to be considered when determining fair use. That copyright owner could file a lawsuit, but the fair use claim is evaluated on these four factors only.

So how does Google Print stack up against the four factors?

(1) Purpose and character. Commercial vs. educational is singled out here, and certainly Google's use is commercial. But that's not the only purpose or character allowed to be considered. A lawyer for Google could claim that their service, meeting people's information needs and directing them to a copyright holder when a work meets that information need, is a Good Thing. They could then go on to argue that making money of this is secondary, but lots of folks wouldn't believe that.

(2) Nature of the copyrighted work. This is hard to pin down due to the scope of what's being digitized. Books that have been out of print for 45 years and aren't widely available in the used book marked would evaluate differently according to this criteria than Harry Potter. (Yes, research libraries collect fiction too.)

(3) Amount of the work. Again, tricky. Google is digitizing (copying) the entire work, and, presumably, using the entire work to create their index. The counter-argument seems to be they're only showing a small part to users of their service, but I don't believe that applies here. The exclusive right is the copying part, not what you show to other people.

(4) Effect on the market. Here is where only showing snippets to end-users comes in to play. Certainly the effect on the market is potentially severe if one could download, print, read a whole book from Google instead of purchasing it. The recording industry feels that way about file sharing, but there are many who disagree, claiming file sharing actually stimulates purchasing. (Sorry no citations right now, but there are gobs of studies out there on both sides of this issue.) I imagine Google would claim that by showing snippets they're telling users about resources they didn't know about before, and are thus adding to the market. This will be an interesting argument to follow.

My conclusion is that the fair use claim is far from a slam dunk in either direction. Personally, I'd love to see this litigated (and found in favor of Google!) to start what I consider to be much-needed reform in copyright law.

IANAL. Any misinterpretations or flawed analyses are entirely mine, and the result of me trying to pretend I know something about this stuff.

Sunday, August 28, 2005

Musings on the state of coyright

The recent brou-ha-ha (wow, I think that’s the first time I’ve ever written that word down!) over Google Print has me thinking about copyright law. I am not a lawyer. I have no legal training or education. I have picked up a bit about copyright law while working in the area of digital libraries for the past five years, however. I think what I think I know is accurate, but hey, I'm wrong a reasonable amount of the time.

The publishers who have objected to the Google Print project say that the project violates copyright law by scanning the books in question (copying, which is the first exclusive right granted to copyright holders by section 106 of U.S. copyright law) to index them. So how is this different than Google’s Web index? Well, in creating the Web index Google caches Web pages too. Caching may not actually be the right word there – Google probably more actively, intentionally, or permanently creates a copy than Random J. User’s Web browser does. One could argue there’s some sort of difference between the caching done by Google of Web pages and scanning page images of printed books, but it seems to me this difference is a matter of degree rather than of real substance. So if the digitization for Google Print is a copyright violation, does that mean all Web search engines are copyright violations?

Let’s take this exercise one step further. Indexes have been around for a very long time: the Readers’ Guide to Periodical Literature, Academic Search Premier, the MLA International Bibliography, and on ad infinitum. I admit to being ignorant as to whether these more traditional indexes tend to operate with the blessing of the copyright holders (although many of them are actually produced by publishers to cover their content), but surely not all of them do, and the library world isn’t exactly abuzz with these copyright holders crying foul. One difference is that the processing that happens to create these more traditional indexes (although this may no longer be true today!) is entirely an intellectual exercise. Any “copying” of the work done to create the index is purely in a person’s head. Is this difference one of degree or of substance?

To go yet another step further, library catalogs use a copyrighted item to create a new representation – is there an argument there that catalog records are derivative works? Obviously we’re in danger of descending into the ridiculous here, but the need for some sort of balance is clear. The concept of balance between the rights of the creator of a work and the benefit to the public good from its use is inherent in copyright law. Too bad the specifics of maintaining this balance are in language that languishes far behind current technologies.

I think it will take a copyright challenge to a large for-profit like Google (rather than to even the most resource-rich library) to overhaul copyright law, to bring it up to the times. Google seems to me to have the desire and the resources to present a reasonable defense, and persist through a legal battle rather than settling the short-term problem through an agreement with publishers. But, as I’ve said, I’m wrong a reasonable amount of the time.

Thursday, August 11, 2005

A billion and one, a billion and two...

The OCLC folks are all abuzz with the addition today of the billionth holding to Worldcat, as reported all over. This is obviously an enormous milestone for OCLC and for libraries in general. Kudos are in order for all of us, I think!

The union catalog has transformed the way libraries provide access to their material. A billion holdings in one database seems to me to be proof positive of that. But OCLC Research staff and many others, researchers and practicioners, aren't content with the functionality our current union catalogs offer. The enormous wealth of data represented by those one billion holdings has the potential to be used in innumerable ways. I believe OCLC's FRBR activities are excellent examples of the sorts of creative things we can do with this data to better serve our users. We've made huge strides in access to materials, yet we have many miles to go.

UPDATE: I've discovered today the misfortune of having a book on The Monkees be Worldcat's one billionth holding. We're going to have a country of librarians walking around for two weeks now with that damn theme song stuck in our heads!

Wednesday, August 10, 2005

Keeping up with technology

Podcasting, Web services, RDF, Flickr, Ebooks. Buzzwords, right? All of these are extremely useful technologies or applications, but I don't actually use any of them. Each has its place, each is good at solving certain types of problems. None, of course, is a magic wand that makes everything in life easier.

I follow a number of library- and technology-related blogs. Many of them hype a certain technology that is meaningful to the blogger for their particular needs. I learn a huge amount from these bloggers, the information they provide, and the fervor with which they provide it. But rarely do I go out and try any of the technologies being described just to see what they are. A few peak my curiosity and I go check them out, but for the majority I just mentally file the information away for when I have a problem the technology in question solves. There's just too much going on in this environment right now to really delve in and learn everything new that comes along. Each of us picks up on the emerging technologies most relevant to us in our personal or professional lives. Other technologies are only relevant to us at a later time, but hearing about them before we need them reminds us of the vast range of possibility out there. Sharing our experiences helps others both to adopt them right away when appropriate, but also to adopt them later as the need grows.

Tuesday, August 02, 2005

To each their own "metadata"

I was introduced to someone today as the "Metadata Librarian," and received a reaction I seem to get a lot: "Oh, metadata, huh? Someday I'll understand that." On my optimistic days, I want to respond "Would you like to go get a cup of coffee and chat?" On my cynical days, "You've got an opportunity here to learn something new! Take it!"

Everyone has their talents and areas of difficulty. We're all really good at some things and equally bad at others. Me, I'm completely spatially inept. It once took me 3 hours to put together a futon frame (with instructions). I'm fine with that, because I know my talents lie elsewhere, although I do often think it would be nice to be handy. Despite my lack of innate talent in some areas, I've never thought I simply can't learn any of it. Little by little I'll learn to fix things around the house. I'll never be able to paint with any level of inspiration, but with a whole lot of practice I might be able to use color effectively or produce a still life that is recognizable. One might think metadata is uninteresting. That's cool. I find a lot of stuff out there uninteresting. But don't think it's unlearnable.

Part of the problem here is that "metadata" isn't a monolithic concept. Depending on one's perspective, it can mean virtually anything. To lots of people, all they need is descriptive metadata, and maybe even some version of qualified Dublin Core their content management solution provides them. GIS specialists delve deeply into an area of metadata many know very little about. For many, text encoding is the metadata world, of extremely rich depth and subtlety. I had an interesting conversation recently with a colleague about the definition of "structural" metadata. By some definition, TEI markup is structural metadata, indicating the stucture of the text by surrounding that text with tags. Does that same logic apply to music encoding? Music markup languages specify the musical features themselves, rather than "surrounding" them with metadata. But certainly there's some similarity to text markup. The boundary between structural metadata and markup isn't the same to everyone. Similarly, there are times when I use the word metadata to refer to something that might more accurately be "data," and when I use it to refer to something that might be "meta-metadata."

All of these views are valid. I'm constantly reminding myself of this. Often when my first reaction is that someone doesn't get it, it's really their view not quite meshing with mine. It's important that we have some common terminology and meanings, but I believe there's room for perspective as well. I can get better at my job if I listen more closely to these perspectives.