« SLA ConGrunt: Managing Copyright ... | Main | SLA ConGrunt: News Libraries, Digital Archives and Preservation »

SLA ConGrunt: News Researchers

News Researchers
Derek Willis - Research Database Editor, The Washington Post
Many of our colleagues aren't having the best of days (ex. Baltimore Sun staff, Philly Inquirer)
Management thinking: if the reporters have access to databases, the researchers can be replaced

Crutchware - software that you will become dependent upon; useful software can actually become a burden
No one product is perfect, or is a perfect fit for every institution
A product "just for you" means throwing away the crutchware and designing your own thing, and you'll have to understand how data and data products work
Being a geek, but not a programmer
The Researcher and the Data
Data is just another word for info
Example: a web page with a recipe: data
Turning static updates into automated databases that are browseable
Searching is good if you have great searchers
Moving from user to producer

Getting Data
Traditional databases
Web pages
RSS feeds
Scrapes
Data entry - should be the last resort

Delivering Data
Blogs
Wikis - find people in your newsroom who will use it; try to focus on leaders/build word-of-mouth
Even closed wikis can eventually benefit the entire newsroom
RSS feeds

Principles to Follow
Always move forward - don't rely on Crutchware
Never turn down data - and don't let the data you do have escape
Steal from the best
Automate/DRY (don't repeat yourself) - take a little time on the front end, save time on the back end
Deploy First (i.e. it's easier to ask for forgiveness than for forgiveness)
Be an evangelist - we're ruthless about doing our jobs, but we need to bring that ruthlessness to promoting OUR work, data products and technology

How Derek Writes Scripts / Development Tools for non-programmers
Emphasis on free, freely available and open source
Programming language: Python - is in English, runs on any platform, very fast, works really well
Django - Python framework
MySQL - Database
Apache - Web server
XPDF - package that strips out the text of PDFs but retains, to some extent, the layout.
GNU Wget - recursive downloads/mirroring sites (there is a specific version for Windows, as well as the standard one for Linux/Unix)

Sometimes, you may need to do end runs around your IT dept.
We should all be developers (even if not software developers), and for that, you need to control your own box
How to get comfortable with using the tools: try it out at home/out of the office
Have a project or a goal in mind