SLA ConGrunt: Microfilm Digitization
These are just raw notes from the session that I attend at SLA - unfortunately, they are not complete (mostly because it was held at 7:30 a.m. and I hadn't had enough coffee). Now that I think my blog is working, I'm going to go ahead and post my notes. If you're interested in other sessions, check out the SLA Con Blog.
News Division Vendors Roundtable: Microfilm Digitization
Why focus on microfilm: microfilm is the official version of "the paper on record" and has the most complete version of content
Various approaches:
DIY with their microfilm and a scanner
Partnering with non-profit/outside orgs
Use of vendors
Vendors
Barbara Beach: VP - Historical Newspapers at ProQuest
Remmel Nunn: VP - Readex (division of NewsBank)
Chris Kelly: VP - Olive Software
Scott Fisher: Account Manager - Newspaper Archive.org, Heritage Microfilm
Barb
What's New at ProQuest
Continuing on model of Historical Newspaper
Has recently added the Hartford Courant, the Chicago Defender, the Zeeland (MI) Recorder
National Archives Publishing Company - bought the current periodical assets, but ProQuest still owns the Historical Newspapers project
National Digitization Newspaper Program: LC program to digitize newspapers
There are also European efforts to digitize newspapers
Historical newspapers are becoming a major tool for researchers and scholars
Remmel
Early American Newspaper Project
Starting to digitizing 1820-1900 newspapers in various states
Chris
Technology company, not a content/aggregation company, based in Israel
Digitizes and analyzes newspaper content
Does electronic replicas of newspapers and magazines
How does newspaper digitization fit into "The Universal Library" model?
Big question: how does the paper make money on its archive(s)?
Scott
Going to start with K-12 markets as well as public libraries - free and premium versions
Q&A
How long does a typical project take?
Chris: Step 1 is microfilm/content analysis to assess the condition of the microfilm
Step 2: automated microfilm readers scan the material
Remmel - one big question is what the level of tagging will be - page level, article level? Granularity is variable
Barb - determining what needs to be searched is critical
What are the economies of scale for digitization? (Scenario of $1 per page for 28K pages given)
Barb - costs may or may not come down; so much of the cost depends on the state of the source material
Chris - prices are definitely going down
Can we get reassurances from ProQuest as to its viability; also, the digitization of the New Yorker (on CD-ROM) is not working?
Barb - ProQuest didn't handle the digitization of the New Yorker
Also, ProQuest is still operating, still has credit and plans to "weather the storm"
Can we learn more about the K12 program and how the content will be provided for free?
Schools and public libraries will be offered a barebones version of the content -- but there's no remote access
How much material is in copyright versus the public domain?
Barb - anyone could, in theory, digitize the historical content, but ProQuest tries to arrange a package that contains royalty and other rights and provides a revenue stream to the newspaper
Remmel - the Readex material is mostly in the public domain
Chris - Olive Software allows for in-copyright material where the rights aren't available
What kind of quality assurance, or when mistakes are digitized, corrections are made to digitized content?
Chris - there's an old adage: garbage in, garbage out; however, there is quality assurance, and the OCR process is considerably more sophisticated
Remmel - OCR is getting better; the efficacy of the search engine is nearly or equally as important as the efficacy of the OCR
Barb - we will re-do a digitization if the resulting scan or OCR is unacceptably defective
How do we try to identify now the history that may conceivably have a market in the future?
Barb - question of the century
Chris - that's the big unknown -- the business model for this is not quite set; print subscriptions are losing money and more resources are being given to web/electronic content - the archive is going to be an important part of digitization but its exact role is unknown
Scott - film is still the preservation model; microfilm from PDF can be done
What are you vendors doing to preserve your content?
Remmel - There are typically 2 copies of scans - one stays with vendor and one goes
Barb - We are still discussing and working on a larger solution to preserving digital content, as well as microfilm
What is the typical file format?
Chris: Olive does PDF distilled into XML
Are any of you working with historical societies, universities and others to get grants to help offset the costs of digitization?
Barb - yes; we work with state libraries, universities, etc.
Remmel - yes, although a number of grants stipulate that the resulting scans should be freely available to the public/constituency of the institution
Closing thought - In this financial climate for newspaper, how does one make digitization of the core product a top priority within the organization (i.e., how to pitch to management)?