Metadata is central to scholarly activity of all kinds. Whether it’s students working on term papers, or researchers writing books and articles, much of that work involves marshaling metadata towards a convincing argument.
And yet, as I have said before, I find working with metadata far more work than I wish it was. More importantly, it’s more work than it needs to be.
Consider the work I do just to be able to gather the metadata to format my citations:
- For some journals, go to site X and find articles I want to cite; download RIS data for each separate article, then use Bibutils to convert them to MODS.
- For other journals, go site Y and find articles I want to cite; download Refer data (they don’t offer RIS) for each separate article, then use Bibutils to convert them to MODS.
- For most books, I can grab MODS data for them directly over SRU from the Library of Congress. Except, the data is often so bad for my purposes (missing name roles, spurious markup, etc.) that I often just create new MODS records by hand.
- For everything else (which is a lot in my case), hand create the MODS records, with a little help from emacs templates.
And in order to be able to use these data, I need to store it in a central location: a bibliographic collection in my eXist XML DB. Nevermind the months I spent writing code to be able to put it all to good use.
I sometimes feel like it’s more work to use “time-saving” web gateways than to just walk over to the library, pull the journals off the shelves, and hand write some notes on what I’m reading. Does it really need to be like this?
Having started to write this, I came across a video that is purportedly an Apple promotional video from the mid-1990s. It lays out a vision for what personal computing ought to look like in the future. Interestingly enough, the video does this by dramatizing scholarly workflow. A Berkeley professorâ€”a geographer no lessâ€”enters his office and opens his computer. A talking head begins telling him his schedule for the day, which includes an afternoon lecture on Amazon deforestation. It seems the absent-minded professor had forgotten about the lecture, and so asks (verbally) the computer to pull up last year’s lecture notes. Not satisfied the information is sufficiently current, he asks his talking head assistant to find all recent related work. The assistant responds “only journal articles?” “Yes,” the professor responds.
I won’t recount the whole video, but suffice it to say that this exactly the sort of seamless access to information that I’d like to see sometime before my career is over. And yet, we’re so far from being there that I often find myself frustrated.
It’s within this context that I observe a rather old debate playing out with respect to the library world. I suppose I started it by asking an innocent question of Kevin Clarke’s post on metadata interoperability: “what about RDF??”
From my understanding, the origins of RDF lie with work done at Apple during roughly the same time period as this video was put together. Indeed, the video is essentially all about a vision of a semantic web. It’s telling to me that Apple chose to dramatize that vision using a professor.
Yet when the subject of RDF is raised in the library community, in general the response is either silence, or outright hostility. I’ve yet to hear a single convincing argument why not RDF, and it bothers me that the design of library standards like MODS and MADS suggests that there has been no attempt to make them RDF compliant.
And yet there is some RDF-related movement in the bibliographic world, though most of it spurred by people coming from outside the community. There is the SIMILE project at MIT, of course. And Leigh Doddsâ€”who had some interesting things to say in response to Kevinâ€”is heading up Ingenta’s quite ambitious move towards RDF. That started awhile back when they started serving up PRISM RSS feeds of their journal holdings, but will deepen significantly as they move to an RDF backend.
All of the sudden I can start to imagine a different way. Instead of me having to maintain my own normalized metadataâ€”which takes a LOT of workâ€”why can’t I just create citations that point to resources in disparate locations on the web? Why can’t I have elegant search applications that can find me the information I needâ€”and access its metadataâ€”without me having to access 10 different sites, most of them poorly designed, and each with their own UI excentricies? And for the RDF community, how about ditching BibTeX (with all of its significant problems) and adapting CiteProc to support an RDF-based approach, where one formats one’s LaTeX/DocBook/OpenOffice/Word documents using an elegant citation style language and distributed RDF metadata?
Really; we need a better way! Yeah, I know there are all kinds of institutional and financial and technical barriers to getting where we need to go, but we need to get there, and it seems to me RDF is a better solution than vanilla XML. As I said in a comment to Kevin’s post:
There are some serious metadata issues that the library world needs to grapple with over the coming years. Iâ€™m thinking not only about figuring out smooth ways to integrate disparate data, but also to begin to better put it in the framework of a larger view such as the FRBR. To do that well, I think the library world needs to do a much better job of interacting with other communities that are grappling with the same issues. I sadly donâ€™t see even a hint that this is happening with RDF.
And conversely, I would add, the library world can still play an important role in revolutionizing the web more generally; in incrementally helping to realize at least some of the vision of the semantic web. What if, for example, the FRBR became broadly used and understood in the RDF world? What if library authority data became widely used and cited far beyond libraries?