« White goat, not white elephant | Main | Holidays and hiatus »

The great digitization of '04

Not that you need to hear this from this blog -- the news is quickly rippling through the library world: Google is helping to digitize library collections. Specifically, Google has made arrangements with 4 academic libraries (Oxford, Harvard, Michigan and Stanford) and one public library (New York PL) to digitize part of all of their monographic collections.

Harvard is starting with a pilot project of 40,000 pieces, in part to test Google's assurances that the books will not be damaged in the scanning process (note: Google will NOT be using the type of robotic scanners that require that books will be need to be unbound in order to be scanned). Oxford will only digitize material published before 1901 and NYPL is focusing on materials that are 1) out of copyright and 2) considered physically fragile. According to the New York Times, both Stanford and Michigan will be digitizing all of their non-serial collections, a combined 15 million volumes. Also, Harvard, Stanford and Michigan will be digitizing public domain AND copyrighted material.

The digital documents will be integrated into Google Print and copies will be given to the participating library for in-house/in-OPAC use.

Lots of news coverage with lots of interesting quotes:

From one of the horses' mouths
The Chronicle of Higher Ed.
CNN's take (with the hi-larious headline)
The New York Times

And the ResourceShelf leads off with news and commentary. In writing for SearchDay, Gary Price points out that this is only one among various digitization projects.

My favourite quote comes from an AP story:

"This is the day the world changes," said John Wilkin, a University of Michigan librarian working with Google. "It will be disruptive because some people will worry that this is the beginning of the end of libraries. But this is something we have to do to revitalize the profession and make it more meaningful."

Bold statement, that. But optimism is a good thing and a lot of people (librarians, scholars, publishers, etc.) will be watching to see if the optimism is warranted beyond a certain office complex in Mountain View.

TrackBack

Listed below are links to weblogs that reference The great digitization of '04:

» I 2014 jobber alle bibliotekarer for Google from Blogg og bibliotek
Google er godt igang med å digitalisere de trykte samlingene til noen av verdens største akademiske bibliotek. I første omgang dreier det seg om den delen av samlingene som er utenfor opphavsrettsregimet. Hva betyr dette? I første omgang er det... [Read More]

Comments

I passed this news around at work yesterday, and my first reaction is excitement. I like the idea of broad digital access, and of preserving the original material as a vital source!

But I am very concerned about the preservation aspect. Once this data is acquired and the scans are made, we're left with a heap of bits, and it's not clear who's going to be maintaining that heap of bits. Preservation doesn't stop after the information leaves the page.

I don't really know yet how appropriate LOCKSS would be for this, but some sort of digital preservation standard really needs to be applied in order to maintain not only access, but the integrity of the digital source.

Hi, Seth!

Briefly, in response to your comment: Yes! Yes yes yes yes yes yes ... and yes.

What role LOCKSS and the newish Digital Resources Group will play ... heck, you'll know before I will. I think it should also be an open process; now that the deal has been announced, this should free up Stanford to open up the discussion of how these works will be produced and the standards and consequences of the protocols that are implemented .

Speaking of which, we should get together over coffee and talk about preservation and rub our hands together fiendishly at some point.

Just noticed the new URL (or maybe not so new, since I just noticed it). Congrats on getting a non-univ. host. :-)

Re: Google Print...I can't for the life of me see the downside here. The materials they will be digitizing will be 99% copyright free, so if they put them up and we don't like what they've done, we should be free to take the TXT files or whatever and do whatever we wish with them...that's actually what I'm looking forward to, the rash of creativity following this much new info in the digital world.

Things I expect to see about .5 seconds after this launches with content:

a. a Firefox plugin that acts as a screenreader, that can be set to save as MP3 for iPod use.

b. a mash-up generator that will combine literature from across the centuries into new works of art.

c. a whole lot of linguists using the database to examine patterns of language usage over time.

d. software that will scrap the database for a given set of authors/subjects and save them to your computer, ready to be thrown onto a Palm or cell phone for reading on the go.

Jason, this blog has always been university-unapproved (i.e. always on a private domain), I just got a new domain name in case I ever change hosts and to make it easier to propagate.

I don't know about that 99% estimate for Stanford, Michigan and (maybe) Harvard. There's an added little commentary from Dan Giancaterino on the ResourceShelf's story yesterday:

"Here's the bottom line for Jenkins Law Library (where I work), which btw is 200 years old. About 1% of our collection is "out of copyright", i.e. published before 1900 (my arbitrary date.) These titles have accounted for less than one-half of 1% of total uses (checkout, internal use, copying, and ILL) in the last 10 years. Digitizing these titles sounds great, but it really won't help our users very much."

Not to mention that book pubishing has exponentially increased from the late 1800s until now. So if Michigan and Stanford seriously contributes most if not all of their active book collections AND storage collections, the majority of it might be, or definitely is in copyright. And nothing I've read suggests that there will be sizeable resources devoted to researching copyright; either it's a known quantity, or it will be defined as being in copyright if it's published after 1922 because even checking the Copyright Registry is 1) resource-consuming and 2) not authoritative.

NYPL and Oxford are a completely different story, however.

I'm holding off on being wildly enthusiastic or wholly dismissive. I hope for the best, and if this ends up being a substantial contribution to the information commons, I'll be letting down my bun and dancing in the streets.