As I was heading out to the fog and friction of ALA I learned a fine young library Turk had come up with a clever idea for converting RSS feeds from My Place of Work into MARC format, for import into library catalogs. I saw a couple of issues, including the lack of coordination with us, the content providers, and posted accordingly. David Bigwood quickly responded, but I see he didn’t quite grok me.
Headed towards ALA, I didn’t have time to fully explain myself, so let me begin. David et al., listen to me. I’m not against sharing content or MARC format or on-the-fly conversions. But what I need you to hear–listen up!–is that the idea of importing LII items into catalogs *without a dynamic upgrade plan* is 19th-century. I think this is conquerable, but please hear me out on the problem.
MPOW item records gain their value from their constant grooming. In an important sense, we are not producing item records so much as an ongoing service. We are all about the updating. That’s what differentiates us from the rest of the Web–our items are groomed, vetted, and updated. It’s the nature of the beast. Every quarter, for every three items we add, we delete one and update two, and that’s what makes us so hot. To place an MPOW item into a catalog without an update plan is like invading a country without an exit plan. Not that anyone we’re familiar with would ever do that. 😉
If you want the Killer App, think of a way to keep the items updated. We’re listening. We had a discussion about this a year ago with California Digital Library, with whom we have a content-sharing relationship for several hundred items. Ultimately they decided that we should routinely export the record set as an OAI/MODS-compliant XML dump. No biggy, this will be easy in our new system, but when we agreed to do this I felt an opportunity slip by. Maybe we can hear from Jon Legree, a former student of mine (I am so proud to say!) who has done some interesting work with RSS and LII.
Talk to us. Really. RSS is easy. It’s well-known. It’s something a lot of us understand. If you can go from sucking in a record one time to staying current with its status, then you will have cracked the code. Just don’t think of catalog records as representing books that you buy once… think of them as journal volumes, ever changing.
Otherwise… you’re strapping jet fuel rockets on a surrey.
Feel free to ask more questions!
A possible starting point for the “Killer App”
NOTE: I can’t actually write this stuff, I just think I know enough about it to think that this could be done by someone who would know how!
The RSS feed from LII is a listing of the new records seen as a URL to the unique LII record number, as well as a title and description. rdf:li rdf:resource=”http://lii.org/search?goto=025969“/
rdf:li rdf:resource=”http://lii.org/search?goto=007825“/
rdf:li rdf:resource=”http://lii.org/search?goto=026533“/
What would be really nice is if that reference number, for instance 007825 (which refers to a site about Canadian public holidays), was an indexed field in LII. The LII record for 007825 tells me that the information was last updated on “Jun 30, 2005,” but without being able to easily query for just record 007825, it is currently a bit messier trying to find that last updated date. It can be done by using the “permenant” link for commenting on a record: e.g. http://lii.org/search?comment=007825. Since this is a standard (for now) URL built from the item number I can strip from the RSS feed, I can automate the checking of the comment URL through a link resolver. A fancy script could probably review the results to find the string following the phrase “last updated” to check the date. Since I know the date of the RSS feed and when I added items to my catalog, I can set up checkpoints at various times to see if a record has been updated. I can also check for deletions of records by seeing if the comment URL for a specific item resolves back to the front LII page. Now, I am not checking for the resolution of the link the LII record points to (maybe it is still a valid URL but has become an inappropriate site), but rather that LII still has a pointer to that resource.
In the new LII, the item ID is an indexed field–having written that, I just wrote the developers to ensure we are indeed actually indexing that field! But it is a *field,* and it is indexable, along with several other key bits of information, such as creation date, last update, HTTP error code, publication date, etc.
Note that by starting with the RSS feed, and not the raw item, you’re missing out on a lot of field information, as in metadata. One thing we can do is ask you folks how to make our RSS items more robust. But for your needs you may think about bypassing RSS altogether, though the advantage is it’s RSS and it’s easy. Basically what we need is a simple way to query for last-updated items that takes full benefit of our metadata–quite a bit of which is not stored in the item record itself in the new system, but in an associated table (that’s true for LC subjects, for example). Maybe do both? How can we communicate back to the cataloger community for input on this?
We roll out the new site sometime this month (God willing). At that point you should look at our feed and our live items all over again and also get a walk-through from me of the item innards. Additionally, we have a LII-to-MODS mapping you would find very illuminating–though note it’s limited to item fields, just to keep it simple. I’ll post it on FRL by tomorrow, along with a sneak preview of the new item detail in the new forthcoming LII. It’s meta-licious.
Sounds exciting! I look forward to checking out the new structure, and hopefully talking further with David at Catablog about the idea of MARC -> RSS instead of the other way around. That might be a way to start from a cataloging viewpoint: how could MARC information be delivered through RSS. Maybe sites that wanted to participate in something like this would have two feeds.
I wonder if this sort of information-delivery mechanism could ease the kerfuffle over adding death dates to authority records.