Skip to content

Thinking out of the Meta-Box: The Rube Goldberg Device

(Slightly tweaked, with an added example, after my morning cup o’ joe)

Many times we already have the content in MPOW that people are searching for. It is heartbreaking to look at search logs and see the user unable to fjord the gulf of execution, knowing that we spent time and money adding the content, but the user has no way of getting to it.

The user, as I said earlier, is not broken. The user is applying skills and task knowledge to searching MPOW, but the user’s skills and task knowledge don’t match our limitations. Our terms are broad, and their terms are specific. Instead of trying to fix the user, what we need is a mechanism that leverages (and weights) the MPOW metadata, gives a tip of the hat to the concept that MPOW has gathered and continues to gather very high quality websites, but also acknowledges that MPOW metadata is both sparse and not granular enough to match user queries. Also, I think we will have to abandon the idea that we can effectively predict user search behavior through scraped metadata. Rather, I am proposing that it is more efficient to scavenge data from the Web in the wild and use the results to help the user connect with the content he or she is looking for in MPOW, content we often have in many cases but is close to unfindable to all but a few cogniscenti.

Remember Rube Goldberg? I like to think he’d be proud of my scheme for improving search for MPOW. It’s zany but, I think, functional. In a nutshell, my idea is to configure a search engine to pass MPOW searches first through the MPOW database, then pass the searches out into the wild, matching items against search engine results, then turn around and match the search engine’s “in the wild” searches against LII items, and finally return search results in this order:

1. MPOW matches (as our search engine does now)

2. In-the-wild matches that also match LII key metadata fields, particularly identifier (URL). Think of this as essentially a federated search against both the wild Web and MPOW. This exploits a concept I first saw in Infomine–plumping item records with native metadata–but doesn’t try to predict the metadata, and leverages this era’s increasingly cheap storage, memory, and processor speeds.

3. Finally, below a disclaimer, display in-the-wild matches not matching MPOW metadata fields–in other words, beyond our selection criteria.

(added) So a user searches MPOW for Susan Sarandon. The search engine finds no matches, but goes on to the wild Web and continues the search. Up come a plethora of matches. The MPOW search engine matches against identifiers (and perhaps other key metadata?), then presents MPOW entries that match the metadata. (This isn’t a real match, but I would imagine the results would be similar to an MPOW search for movie reviews.) (How far would it process? I think that’s a question for testing to answer, but I’m guessing parsing through the first 100 results would be plenty.)

What do you think? Is this a crackpot scheme? (If so, is that such a bad thing?) Could this idea help MPOW? Could it help other directory-style databases? (Imagine if OPACs did the same kind of three-legged search against full-text book texts.)

One last sideways comment: the value of Google Print is not that the content is digitized, but that the information is both findable and discoverable. (To paraphrase Roy Tennant, librarians like to scan, but users like to find.) It is findable in that if you are looking for a text, you will find it (in a way that I cannot get to the full text of a 1950 essay by Nabokov because it is trapped in print in a copy of the New Yorker). But it is also discoverable, in the sense that if you aren’t looking for it, you’ll find it anyway, assuming you are on a hunt for information matching its terms. This is a solution that introduces its own problems–something to discuss another day–but needs to be considered in light of how people seek information in the 21st century, assuming incidental illumination–Frederick Taylor’s concept of added-value service–continues to be central to the core values of librarianship.

My next and last entry on my idea will address my predictions for performance configuration requirements for the Rube Goldberg device.

Posted on this day, other years: