(Slightly tweaked, with an added example, after my morning cup o’ joe)
Many times we already have the content in MPOW that people are searching for. It is heartbreaking to look at search logs and see the user unable to fjord the gulf of execution, knowing that we spent time and money adding the content, but the user has no way of getting to it.
The user, as I said earlier, is not broken. The user is applying skills and task knowledge to searching MPOW, but the user’s skills and task knowledge don’t match our limitations. Our terms are broad, and their terms are specific. Instead of trying to fix the user, what we need is a mechanism that leverages (and weights) the MPOW metadata, gives a tip of the hat to the concept that MPOW has gathered and continues to gather very high quality websites, but also acknowledges that MPOW metadata is both sparse and not granular enough to match user queries. Also, I think we will have to abandon the idea that we can effectively predict user search behavior through scraped metadata. Rather, I am proposing that it is more efficient to scavenge data from the Web in the wild and use the results to help the user connect with the content he or she is looking for in MPOW, content we often have in many cases but is close to unfindable to all but a few cogniscenti.
Remember Rube Goldberg? I like to think he’d be proud of my scheme for improving search for MPOW. It’s zany but, I think, functional. In a nutshell, my idea is to configure a search engine to pass MPOW searches first through the MPOW database, then pass the searches out into the wild, matching items against search engine results, then turn around and match the search engine’s “in the wild” searches against LII items, and finally return search results in this order:
1. MPOW matches (as our search engine does now)
2. In-the-wild matches that also match LII key metadata fields, particularly identifier (URL). Think of this as essentially a federated search against both the wild Web and MPOW. This exploits a concept I first saw in Infomine–plumping item records with native metadata–but doesn’t try to predict the metadata, and leverages this era’s increasingly cheap storage, memory, and processor speeds.
3. Finally, below a disclaimer, display in-the-wild matches not matching MPOW metadata fields–in other words, beyond our selection criteria.
(added) So a user searches MPOW for Susan Sarandon. The search engine finds no matches, but goes on to the wild Web and continues the search. Up come a plethora of matches. The MPOW search engine matches against identifiers (and perhaps other key metadata?), then presents MPOW entries that match the metadata. (This isn’t a real match, but I would imagine the results would be similar to an MPOW search for movie reviews.) (How far would it process? I think that’s a question for testing to answer, but I’m guessing parsing through the first 100 results would be plenty.)
What do you think? Is this a crackpot scheme? (If so, is that such a bad thing?) Could this idea help MPOW? Could it help other directory-style databases? (Imagine if OPACs did the same kind of three-legged search against full-text book texts.)
One last sideways comment: the value of Google Print is not that the content is digitized, but that the information is both findable and discoverable. (To paraphrase Roy Tennant, librarians like to scan, but users like to find.) It is findable in that if you are looking for a text, you will find it (in a way that I cannot get to the full text of a 1950 essay by Nabokov because it is trapped in print in a copy of the New Yorker). But it is also discoverable, in the sense that if you aren’t looking for it, you’ll find it anyway, assuming you are on a hunt for information matching its terms. This is a solution that introduces its own problems–something to discuss another day–but needs to be considered in light of how people seek information in the 21st century, assuming incidental illumination–Frederick Taylor’s concept of added-value service–continues to be central to the core values of librarianship.
My next and last entry on my idea will address my predictions for performance configuration requirements for the Rube Goldberg device.
What do the search engines you’re thinking of matching against think of this idea?
Oooh, if you could pull this off, I would think I’d died and gone to heaven.
I had a discussion with one search engine vendor yesterday, but I know we weren’t fully communicating (simply matching and displaying Yahoo results is not what I’m after). Today I pointed that vendor to these posts, figuring a word is worth a thousand pictures, or something. I’ve repeatedly asked the sales department of another vendor (Google) if I could speak with them about my idea, but they never call, they never write…
We have a lengthy RFI for search engines that I developed, by the way, and this isn’t on it, at least not specifically. Yet. 🙂
I know I didn’t quite get it until your update after coffee. Now I get it. My for example (to see if I get it): user searches for Susan Sarandon hoping to find her fan club address, doesn’t know that a MPOW entry is for actors…. MPOW goes to the internet and figures out from the plethora of sites that she’s a) an actor b) politically active c)has starred in… d) connected to Tim Robbins, etc. The search engine puts in actor to MPOW, and suggests entries for that, then puts in politics… Is that what you meant? Isn’t there some sort of semantic/syntactic fancy sort of analysis required? Is that too hard for these folks still?
Um. I think in practice, response time would be unacceptably slow. I say this based on what I’ve heard about federated search behavior. But it could be worth a try.
Another couple of approaches:
(1) Don’t do the search on the fly. Constantly index the actual content of all LII-selected sites. You still have the Goldbergian option of searching first against the meta-level terms in LII records and then, if zero results were retrieved, against the bigger index of site content.
(2) Somehow generate a massive semantic index that matches content-level terms to cataloging-level terms. Probably would best be done by running the same indexing as in (1) but not on such a frequent basis.
(3) Offer a “try this search again in Google” option. Then get Google to rate LII-selected sites higher 😉 or to flag them in some way as librarian-recommended.
In any event, I don’t think you necessarily need send the search out to the web at large. I would first try a search that is limited to the LII sites because there is no point generating enormous result sets only to pare them down.
Google follows the ODP layout for its directory project, if you could crosswalk MPOW’s directory with ODP, you could probably leverage Google’s index to place a user into the MPOW directory tree. For example, searching Susan Sarandon in Google directory gives a related category of Arts – Celebrities – S – Sarandon, Susan.
Top Technology Trends: Drivers Wanted
As I did last year, in preparation for 20 minutes on a panel at ALA where I make a fool of myself, I’m again soliciting your input for the top technology trends influencing all things Library. Also like last year,…