« December 2004 | Main | February 2005 »

January 20, 2005


I've been meaning to post this for at least a week ...

It's been quite gratifying to see my notes on the Google/Stanford deal posted in various blogs. Unfortunately, there won't be a chance for me to present more info because I'm leaving Stanford in 2 weeks.

It's another internship, to last 6 months. I'm going to be pretty poor for a while and it's going to take some creative fundraising to get to either SLA or ALA this summer (or, in theory, both). And I've been warned that my day-to-day tasks will be pretty mundane. My commute won't be much fun, either (an hour away, in light traffic).

However, I think it'll be worth it. I'll be working at the Internet Archive. Mostly doing QA and basic metadata work on non-web collection.

It should be fun. And kinda scary. I'll let you know how it goes.

January 18, 2005

Stanford FAQs

More Stanford info about the Google project

January 14, 2005

Bits of news

As evidenced by my lack of gushing enthusiasm, I am not attending Midwinter. The end of last year left me incredibly tired and I just didn't have it in me to go to Boston in the middle of winter and ride busses to go to meetings I'd be too tired to even care about.

I do look forward to reading about Midwinter from various usual and unusual suspects, including the new official ALA blog by PLA.

In the meantime, now here's something I hope you really like [yes, it's paraphrasing a semi-famous quote]:

1) Sunshine Week

Opening a dialogue about the public�s right of access to government information is the focus of Sunshine Sunday and Sunshine Week: Your Right to Know, which kick off March 13, 2005, and continue through the following week.

Participating daily and weekly newspapers, magazines, online sites, and radio and television broadcasters will feature editorials, op-eds, editorial cartoons, and news and feature stories that drive public discussion about why open government is important to everyone, not just to journalists.

I know there are some news librarians that are already alerted to this effect. I think this should be a big deal for Gov Docs librarians as well.

ALA has what appears to be limited involvement:

In addition to media efforts, a partnership with the American Library Association will provide the opportunity for education and community discussion of Freedom of Information issues on the local level. Sunshine Week also ties in with the Freedom Forum First Amendment Center�s 2005 FOI Day on March 16, 2005 in Arlington, Va.

The Sunshine Week website will be rolling out content (hopefully soon) and will become a clearinghouse for materials on open government/FOI efforts and obstacles.

2) SLA has launched an online Legislative Action Center as part of their overall Advocacy program. From the SLA announcement, the Center is "a grassroots advocacy service for SLA members to use in learning about, and acting on, public policy matters affecting the information profession."

When visiting the Center, members can review legislation, learn how to communicate effectively with legislators, identify the appropriate elected officials and media with whom to communicate, and share views with lawmakers via targeted e-mail, fax, phone and wire service.

The SLA Legislative Action Center is also equipped with a comprehensive full-service election component, including detailed candidate bios, state-by-state voter registration forms, and absentee ballot explanations. The expected result: a more informed and engaged SLA membership that participates in the legislative and electoral processes.

The service is currently configured to support communication between members in the United States and their elected representatives. As content and technology allow, SLA will explore the integration of other nations� legislative contact systems into the Legislative Action Center.

3) A good compendium of updates on the Salinas situation from Conversational Reading. 'Nuff said (for now).

January 07, 2005

The Google deal (down on the Farm)

Edit: Caveat -- while this information was given at a staff meeting, it was announced by Andrew Herkovic (see below) that it was a 'public forum' and that employees could talk about any issues/facts/etc. that were communicated in a public forum with people outside of Stanford. On the basis of this, I posted the following information.

Today at work, there was an all-hands meeting for all library employees who wanted to learn more about the Google/Stanford deal. The talk was conducted by Catherine Tierney (Technical Services) and Andrew Herkovic (Foundation Relations & Strategic Projects) of SUL/AIR.

Google book digitization project -- 5 partners (Harvard, Stanford, UMich, NYPL, Oxford)
Stanford has NOT made a commitment to digitize all of its books, but we "will do as much as we can for as long as we can."
No exact number of books to be digitized given
Oversized materials (such as atlases) have been digitized in the prototype and have been discussed as part of the project, as will accompanying materials to books (maps, fold-outs, etc.), at least in theory

What Google will provide to the public --

  • Works in copyright won't be fully available

  • For copyrighted works -- there will be a click-through to the appropriate OCLC WorldCat record

Approximately 10% of Stanford's overall collection is clearly out of copyright; other material in the public domain (such as U.S. government documents) will be included in the project
Google will be responsible for determining what's in copyright and what's not if there are any questionable materials and copyright will drive what will be fully displayed
There's no special provision to fully display material in the last 20 years of copyright
Foreign language texts, including non-Roman languages, will be included

Google will be digitizing Stanford's material on Google's property, using their equipment/protocols and with their staff; the company has not yet been forthcoming as to how the process of digitization will be implemented in detail; however, Google's process is characterized as "industrial-strength digitization"
Google will be responsible for quality control of the scans
A format for the scans has not been decided
De-duplication is not a part of the process, at this time; Stanford is interested in having multiple copies of the same material across various partners
Google is being "coy" about standards and specs; minimums have been given, but little to no fixed specs
We believe that Google will be doing full OCR and indexing of everything they scan for us
Stanford may not mount everything that Google gives to us, but we won't reject scans for having less than perfect accuracy, either

Stanford will receive copies of Stanford's books but won't necessarily be getting the scans from Google's other partners; SUL is under contractual obligations to Google, so we won't/can't give away the digital materials to other projects, such as the Internet Archive or Project Gutenberg; however, we may be able to share our copies with other educational institutions
SUL isn't sure how the digitization will impact ILL
Funding: the scanning will be funded by Google, but the transfer of books to and from Google will have separate library funding
There are currently no plans on publicizing the protocols/process to outside institutions -- it will depend in part on the legal landscape that Google/Stanford faces prior to and during implementation

Factors in choosing which collections (or parts of collections) will be digitized:

  1. Current physical space of materials

  2. Percent of material out of copyright

  3. Will the collections end up in SAL3 [Stanford Auxiliary Library 3 in Livermore]

  4. How other projects (such as the Hoover monographs move) could be impacted

  5. Interest by publishers to make copyrighted works fully available

The plan is to start with just a few thousand books; the project will be implemented in stages
Material that is already in electronic format will not be excluded, ideally, but it may later become a factor in what material is chosen

In the short term, Stanford users will get access to the digitized texts via Google Print, just like general users -- in the long term, the scans/digital page images will be mounted on Stanford's servers with enhanced access, as part of SUL's Digital Repository

Impact on Stanford's users:
+ Materials to be scanned will be officially checked out; any books that aren't currently barcoded will have to be routed internally for barcoding before being sent to Google
+ SUL is considering arrangements for alternative access to materials that is in the process of being digitized, but there are no hard plans yet
+ Material may require metered access by Stanford users depending upon copyright issues
+ Each books that is digitized will be KEPT -- our patrons will continue to need those physical books and we will provide for them
+ We've retained the right to send or refuse to digitize material that we believe is too brittle/fragile to survive the process

More information about the project, including a FAQ, will be available at the SUL/AIR website in the near future