« April 2005 | Main | June 2005 »

May 31, 2005

Machine Readable

[Caveat lector: It's another Google post. Does it seem like I'm picking on Google a lot? The company has so many fascinating services and products and gee-whiz stuff going on ...]

I've been considering some of Google's services, with the thought of machine indexing and relevance. Mind you, I'm pretty lost when it comes to search engine technology or algorithms of relevance ranking beyond the general notions of such things. But there's been a confluence of observations regarding relevance as assigned by Google, so it's at the forefront of my musings.

It actually started a while ago, when Gmail became a very popular beta but some people were concerned about privacy issues regarding the machine indexing of messages received and sent from Gmail. It's not an issue I hear much about nowadays, but I haven't been keeping up with Search Engine Watch.

Yesterday, I saw Jessamyn's post about GoogleAds and relevance. (Regarding the idea of persons finding them for sale on eBay, I'm sure that many people consider Jessamyn a treasure, but she is by no means a commodity ...) As it happens, it was just a half-hour after reading a column by Robin Peek of Information Today about Google News. She writes:

Has Google created the killer news app? Google News only provides its sources' previous 30 days of content and updates the content every 15 minutes. And the service prides itself on the fact that the results are "compiled solely by computer algorithms, without human intervention." This is a feature that actually forces a person to view a more extensive array of news sources than someone would normally encounter. ...

But there is a certain randomness in which sources are presented. For example, items seem to be arranged in reverse chronological order. But on a customized section I created on the semantic Web, a March 9 item came before a March 14 item. The lack of human touch was also evident in another customized section on open access journals. My results included an article from the U.K. on how accessible the great outdoors was.

But this came to a head for me a few days ago. Recently, the New York Metro ran a story about a court case over allegations of 30-year-old accounts of child molestation at a prestigious boarding school -- the story has been somewhat prominent in the blogosphere given that the lead attorney is 1) quite famous within and outside of legal circles and 2) has admitted that he too was a victim of the abuse.

It's a very sober article, but it's displayed within what appears to be the standard template for articles on the site. Each of the 8 pages had a couple of color ads, and 5 had GoogleAds (the remaining 3 had Google Public Service Ads). Most of the Google Ads were easy to ignore but seemed semi-relevant to the topics discussed the pages (I saw, and continue to see, ads for Internet filtering software, self-defense courses, Vienna Boys Choir merchandise [the school specializes in boy choirs]). But then I got to page 7, and I really wish I had taken a screenshot of it.

The first 3 of the 5 GoogleAds on the site related to the Michael Jackson trial where he's accused of child molestation. I was stunned and appalled. So I went back and looked at all of the advertising. I didn't see those types of ads on any other pages. The thing is, the ads were relevant, as the overall subjects are identical. But it felt crass and wrong (and yes, I know there's much worse to be found on the Web, by design and intent). The fact that this was due to no human intervention, that it was decided by machine processes, just made it downright creepy in a way that's hard for me to articulate.

Google's feedback FAQ about GoogleAds/Ads by Google explains:

'Ads by Google' are ads that connect people to information about products and services that are relevant to the content they're reading online. Google technology understands the nuances of language, and closely matches (or "targets") ads and links to the specific content of web pages. For example, if you're reading an article about favorite pasta recipes, you might see ads for related items—like different kinds of pasta, cookbooks, olive oil and so on—on the web page.

Who knew that technology could be tacky? Okay, that's probably anthropomorphizing too much. But I wonder if I was the only person to feel this way. When I checked the article again yesterday, after reading Robin's and Jessamyn's comment, there were radically different ads in the same spot. And there is a way to provide feedback on specific ads to Google. Or maybe the ads had expired and the template was refreshed with new, relevant-but-different content.

I have no good answers to this and I'm not even sure I'm asking the right questions. There's no moral to this story, and it's not much of a story. I'm just trying to figure out if I can discern a pattern ...

SLA Toronto

Woo hoo!

My schedule is here.

May 30, 2005

The CA Culinary Academy, Part II: LongNow

My previous entry about the SLA meeting at the CCA reveled in the retro feel of a working card catalog. But the focus of the meeting is very much futuristic. The guest speaker was Alexander Rose, Executive Director of the LongNow Foundation.

It was a generally cool talk/presentation [PowerPoint file]. There were pictures of the Rosetta Project [if the site is still down, here's a cached version from The Internet Archive] and the 10,000 Year Clock. But what really caught my attention was Rose's Powerpoint slide on the future of digital archiving:

Peer to peer archiving

The network is the archive…

  • All data is broken up into chunks and resides on the existing workstations within the institution.
  • No centralized expensive server farm.
  • No computer has a full copy of anything, but the network as a whole has multiple copies of everything (RAID).
  • As each workstation gets upgraded so does the network eliminating server farm upgrade costs.

More secure, cheaper, and more redundant.


I don't know how feasible this is in the short or long-term -- for instance, Brewster isn't about to let go of the leases for the Archive's server farm. But this could be really interesting. The redundancy element is already present in a few archiving/long-term access projects (such as Stanford's LOCKSS). What would P2P delivery of e-books or digitized texts entail? How would the standards for a digital text change? What formats would work best? What protocols for quality and authentication might be useful (or useless)?

Can you imagine using BitTorrent to download a book?

This idea has tons of drawbacks, especially when dealing with non-public domain material, but I also think there's quite a bit of potential, at least in terms of thinking about how libraries and archives move forward in delivering assets to their patrons/users. And it means that libraries and archives may have a stake in the future of P2P networking systems and on that basis should pay attention to cases like MGM v. Grokster.

We're quite far off from Rose's vision of P2P archiving replacing centralized repositories. But is it realistic? And if so, how do we get there from here?

May 24, 2005

The CA Culinary Academy Library: Kickin' it old school ...

[Caveat lector: Graphics ahead ...]

Far too many weeks ago, I went to a meeting of the SLA SF Bay Chapter that was held at the CA Culinary Academy in downtown San Francisco. The majority of the program took place in a basement banquet room, there was an opportunity before dinner to take a tour of the school's library. As befitting a culinary academy, the collection is devoted to the techniques, sciences and arts of food, beverage and hospitality. It's a pretty cool little library ... I browsed the book collection a bit and decided to see how many volumes of M.F.K. Fisher's work the place held. But I couldn't find any of her titles. So I turned to the library assistant (and SJSU SLIS student), Daniel Hollander, to see if he could help me. He turned to the catalog:

The card catalog

A card catalog! Watch him work that drawer:

The drawer in question

I hadn't even registered the piece of furniture, let alone recognized its function. Daniel pointed me to the right LC class for Ms. Fisher and I found several of her books. I'm a really big fan of electronic catalogs/OPACs, but watching someone work the card catalog was pretty neato peachy keen.

A close-up of the drawer

May 20, 2005

Google portal page?

There is a personal Google page up.

I wish I could say I screamed in a "Eddie Izzard's coming to town and I have tickets!!!!" way, but it was more of a "What do you mean someone has paid Paris Hilton to play a fictional character???" type of scream.

Then again, I was (para)professionally scarred by Lycos' migration from "very useful search engine" to "okay portal" to "screw this, I've heard this thing called Google". So, Google's portalesque activities tend to make me flinch a little, but that's my irrational paranoia. When I calm down, I will realize ... hey, that's nice. That's uncluttered. If I had a Gmail account, I'd sign up.

In the meantime, I'll be on the couch with some smelling salts and a fan ...

Biometrics @ Your Library

Well, not my library. I'm nowhere near Naperville (IL) PL ...

Library card? Check. Fingerprint? Really? from the Chicago Tribune [reg. req'd; here]:

Before long, patrons wanting to use Naperville Public Library System computers without a hassle will have to prove their identity with a fingerprint.

The three-library system this week signed a $40,646 contract with a local company, U.S. Biometrics Corp., to install fingerprint scanners on 130 computers with Internet access or a time limit on usage.

The decision, according to the American Library Association, makes Naperville only the second library system in the country to install fingerprint scanners.

Regarding privacy concerns: "The stored numeric data cannot be used to reconstruct a fingerprint, [Library Director Mark] West said, nor can it be cross-referenced with other fingerprint databases such as those kept by the FBI or the Illinois State Police."

The ACLU is dubious. Deborah Caldwell-Stone of ALA-OIF is cautious but not condemning.

Not that you should care, but if it came to my library: I'd probably take a personal position not to use the technology but there would come a day where I'm in downtown {wherever} and I just need to look up an address or my itinerary and I'm already late AND I'm on public transit, so I'd do the fingerprint thing to get the heck out of Dodge already, and I'd feel really guilty about letting convenience override personal conviction and I'd probably do it a few more times, but then I might just start avoiding that library just to avoid the guilt. Or not. Yes, I am a freak.

May 18, 2005

Baa Ram Mu?

Are there any readers out there who are familiar with Beta Phi Mu? If so, what are the real-world benefits of the organization?

Thank you ...

May 15, 2005

Return of the Bride of the Son of the Revenge of ...

I'm back!

It took a while to futz around with the postgres and the directories and the importing ... but it worked! And there's still stylesheet stuff to work out.

So, what'd I miss?

May 04, 2005

Guilty pleasures

It's been 3 months out of academia for me. What do I miss from the Farm?

  • Definitely the people (and I haven't visited enough)
  • The collections
  • The lectures, fora and other programs
  • The easier commute (even outside of rush hour, my commute is 3X what it used to be [ 20 minutes, fastest time vs. 1 hr., fastest time])
  • Being on a college campus, even one without a pool hall or a pinball machine
  • The campus bookstore
  • Being able to see Rodins on my lunch hour

But I knew I would miss that stuff. What I did not expect to miss: OCLC's WorldCat.

Yes, I know there's millions of WorldCat records on Google and Yahoo, but it's not quite the same as getting it through FirstSearch. And, curmudgeon that I am, it's not like I didn't have issues with FirstSearch ... but it was very easy and rather streamlined and you could hop from record to record to record ... in a snap.

Yeah, I kinda miss Eureka, too, but from a GovDocs angle, I had more baggage with RLIN than OCLC (try searching for annual report as a Title word search via Z39.50 in RLG if you want to see why I have baggage).

You know, it's the little things ...

May 02, 2005

Bouncy Bouncy

I've been told by highly placed sources that the Linux machine that is the lair for this weblog (as well as the web server for my lair overall) is on its last legs. A replacement is in the works but probably won't be up until next weekend, minimum. I apologize in advance for the bouncy-bouncy treatment and lack of new posts and I ask for your patience.