Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 512 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 527 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 534 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 570 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/cache.php on line 103 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/query.php on line 61 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/theme.php on line 1109 darcusblog » 2007 » May - geek tools and the scholar

Archive for May, 2007

RDFa for Ruby on Rails

Posted in Uncategorized on May 29th, 2007 by darcusb – Comments Off

A note on a new RDFa plug-in for Ruby on Rails. It seems like it should provide a convenient way to expose data in a Rails app to the semantic web.

The new OpenDoocument metadata proposal that we’ve just about wrapped up, BTW, borrows some ideas from RDFa for tagging document content as RDF triples. It would be most excellent to see at some point users being able to copy-and-paste RDFa or microformat enhanced content from a web page into an ODF document and for it to retain the metadata as it travels.

More XMP Confusion

Posted in Uncategorized on May 26th, 2007 by darcusb – Comments Off

More uncritical praise for Adobe’s XMP. To quote the most problematic parts, first:

Luckily Adobe is not as protective and closed about standards as some software heavyweights, so when it drew up its own successor format, it chose to base it on XML and publish the specification. That spec is XMP: an XML for generic image metadata.

Ahem, once again this completely misses the RDF connection. XMP is not fundamentally an XML format; it is an implementation of RDF. That is to say, the value XMP offers is that it provides a general metadata model (borrowed from RDF) which allows one great flexibility in representing the metadata that you want to represent. That it has an XML serialization format is just gravy.

The second issue with this characterization is that it completely misses that XMP is currently effectively a proprietary subset of an open standard. As I wrote on another blog because XMP is a rather bizarre subset of RDF defined by Adobe very early in the life of RDF, there is a whole lot of valid RDF that is not valid XMP. For this reason, while vanilla open source RDF tools can easily read XMP, they cannot reliably write it. Hence the reliance on Adobe tools for the most part, which are C++ only..


Adobe has been shipping XMP-aware apps since 2001, but up until now we have not seen an open source application step up and tackle XMP head on, providing read/write support, the core namespaces, and creation of custom namespaces. In fact, the apps that did address XMP all stopped at reading the data, not even doing anything useful with it.

The article implicitly criticizes open source developers for the lack of support for XMP, when the real blame lies with Adobe. If XMP supported RDF proper (the full model), I am confident there would be much more widespread support for both reading and writing XMP in files.

Just to give you a sense of why this matters, a developer for a popular open source RDF library added experimental support for XMP serialization awhile back. When I emailed him and told him he needed to severely constrain the modeling and serialization in ways that he was not then, he basically told me it’d be too much work to do for too little payoff. I don’t in the least bit blame him.


The XMP spec is open, and better still, extensible through XML namespaces.

I’m not really sure what definition of open this author is choosing to use. Yes, the spec is published and can be freely implemented. But it is not an open standard in the sense that it has no vendor-neutral standards body overseeing its evolution beyond the state-of-the-art in 2000.

I really do admire Adobe’s foresight in building and implementing XMP. Having a generic and flexible metadata system that works across file formats and applications is a really visionary goal, and they have largely achieved that.

But to really realize these benefits beyond Adobe applications, Adobe needs to loosen up XMP. I’d like to see them dedicate the spec to the W3C as providing a basis for embedding metadata in files, and work with the RDF experts there to bring it up-to-date with current best practices.

Such a move would no doubt present some short-term difficulties for Adobe itself given the large installed base of applications already using XMP at Adobe, but it would ultimately grow the market for enhanced metadata in the future.

The idea that our word-processing, spreadsheet, image, presentation, audio, etc. files should contain rich metadata that travels with the file is one whose time has come. It is my hope that the OpenDocument metadata work will help contribute to this goal, but we really need for this metadata to be able to travel across disparate formats. I’d love to see Adobe help realize this goal, but if it doesn’t, perhaps the W3C ought to consider doing so on its own.

Citations and Fields

Posted in Uncategorized on May 26th, 2007 by darcusb – Comments Off

I’ve been having an interesting discussion with people involved in implementing citation processing in Zotero. This is the functionality that allows one to add a citation to your Word or OOo Writer document, and have it and the bibliography automatically generated.

They’ve stumbled on a rather large conceptual and practical stumbling block: how to implement note-based citations. If a user adds a citation to the document and it is automatically rendered as a footnote, is that object then a citation in a footnote, or a citation that is simply rendered as a footnote?

Use Cases

Allow me to explain with some use cases:

Basic Case

A user starts a new research paper. They select a footnote-based citation style. They add citations to the document, and each of them is automatically rendered as a footnote.

They then realize they need to use a different citation style, and choose instead an APA in-text author-date style. The footnoted citations are then automatically moved into the text in the proper form.

Complex Case 1

Users wants to add a footnote to the document and include one or more citation references in it. They add the footnote, and then add both their commentary and the related citations. If they switch to a non-note-based citation style, this footnote remains a footnote; only the citation rendering changes.

Complex Case 2

User wishes to add commentary about the citations in the note to that note (as opposed to in the body text). User clicks in the body of the footnote and begins typing. If they switch to a non-note-based citation style, this footnote also remains a footnote.


Citations can occur either in the main body text, or in notes. Whatever the citation style, (rendering of) citations in notes are different than body text citations, because they occur in the context of note-based commentary. Their position in the note is thus not an artifact of the citation style, but rather fundamental to the content. Both the content of that note and its citations will remain in the note regardless.

There is no disagreement about the basic case. We all agree citations should be automatically footnoted in note-based citation styles. This is not some theoretical problem. Some fields use both note-based and in-text author-date styles, and absent automation, users wishing to switch from one to the other would have to manually move every single citation in and out of their notes, a tedious process. We all agree it’s a major shortcoming of existing applications (like Endnote) that they do not manage this issue for their users.

Where we diverge is on implementation details highlighted in the complex cases.

Complex Case 1 illustrates the clear distinction between the two: it is a citation within a footnote, rather than a style-dependent footnoted citation.

Complex Case 2, however, demonstrates a likely case where the user in essence might want to convert a footnoted citation into the first form.

So two different issues of concern to me:

First, what should the user experience be here when a user would like to add commentary to citations?

Forget about footnotes. Consider short comments in in-text citations? I want to do (Doe, 1999; see also Smith, 2000, chapter 2). Can I do this? If so, how? If I do, how do I select the citation source?

Note: my questions above do not necessarily presume any answers. I am asking, though, because users sometimes do use notes in in-text citations.

Second, how should this be encoded in document formats (specifically ODF and OOXML) such that users can be confident of some acceptable level of interoperability in citations across different applications?

The debate we’ve been having touches on both dimensions of the question, but a bit more on the latter. In short, should a citation field in ODF or OOXML be allowed to contain a footnote or endnote, or must the citation always be wrapped in the note?

Allow me to illustrate using the new text:meta-field from the ODF metadata work. Let’s imagine a multi-reference citation with an author-date style. It might be done like so:

<text:meta-field xml:id="citation-1">
  (<text:meta-field xml:id="citation-1-r1">
    Doe, 1999
  <text:meta-field xml:id="citation-1-r2">
    Smith, 2000

So we have a nested field. These fields are then hooked up (via a binding that uses the xml:id) to some RDF/XML in the file package.

To a user, this would display like:

(Doe, 1999; Smith, 2000)

They could individually select the references, which would be read-only.

So now: what happens if the user changes to a note-based style?

My argument is that because the footnote/endnote rendering is only an artifact of the processing, and does not reflect a user’s explicit choice, the XML encoding should reflect this by including the footnote within the outer field; something like:

<text:meta-field xml:id="citation-1">
    <text:meta-field xml:id="citation-1-r1">
      Doe, 1999, Some Title, New York:ABC Books.
    <text:meta-field xml:id="citation-1-r2">
      Smith, 2000, Some Other Title, London:XYZ Books.

The only time a citation should be contained within a note is when a user explicitly chooses to do so.

So the questions are, I suppose:

  1. Does this make sense from a user-experience and document-encoding perspective?
  2. Can this be implemented such that we can—at least some point in the not-distant future—have interoperability across different editing and bibliographic applications?

To be more concrete, when MS adds support for note-based citations, how will they encode them in OOXML? When OOo developers add support for the new metadata field and citations, how will they do it?

[update: fixed some minor typos]


Posted in Uncategorized on May 17th, 2007 by darcusb – Comments Off

I was wondering when someone would finally apply best-of-breed contemporary web design approaches to the realm of citations. Well, BibMe does just that: AJAX and Ruby on Rails underpinnings and a gorgeous interface. What more could a time-strapped undergraduate want?

There are lessons here for applications aimed at more professional scholarly users. Consider how clean and simple it is to enter a book. The default interface allows you to enter a title or isbn:

default book interface

So I enter my book title, and some AJAX magic quickly brings up a results-list without loading a new page:

results list

Finally, I choose the correct item, and it gives me the pre-filled metadata:

filled form

If that auto-fill stuff doesn’t work, a quick JS-enabled flip to the “manual” form yields this:

manual book entry form

Nice; this is how all online bibliographic managers ought to work!

My only real critique (and it is fairly minor) is that they didn’t OpenID-enable the service.

And looking farther out, it’s really unfortunate that we’re faced with two levels of application: the simple more-or-less manual citation process of BibMe and others, and the more robust and automated integration of Endnote, Zotero, BookEnds, etc. As word-processors like Word, OpenOffice, Google Docs, etc. and their fil formats (OOXML, ODF) start to get real citation support, though, this awkwardness ought to go away, and we can have richer, more automated and more interoperable solutions.

Note: BibMe prefills “2005″ as the year for my book. While strictly true that it was published in 2005, the copyright date is actually 2006. I think this would result in a technically incorrect bibliographic entry, then (though have seen others cite it this way, so who knows?).

DCMI Abstract Model and RDF

Posted in Uncategorized on May 7th, 2007 by darcusb – Comments Off

Library blogs were buzzing last week about a recent announcement of a collaboration around RDA and DC [summary here]. For those not familiar with the acronyms, this basically means an effort to bring a very high-level next-generation approach to library cataloguing together with more grounded ways of encoding metadata in the DC world. It also represents an effort to bring this library expertise to the semantic web.

In general, the commentary is positive. However, Jenn Riley brings up an interesting critique.

This seems to be to be entirely backwards – trying to harmonize DC principles with RDA after the fact. Didn’t the DC community learn its lesson about the pitfalls of this approach when developing the Abstract Model, only realizing long after developing a metadata element set that it would benefit from an underlying model.

She goes on to explain:

This general approach failed miserably with the DC Libraries Application Profile. There, the application profile developers wanted to use some elements from MODS, but weren’t able to because MODS doesn’t conform to the DCMI Abstract Model. So basically what the DC community said here was that application profiles are great, they form the fundamental basis of DC extensibility, but, oh yeah, you can’t actually use elements from any other standards unless they conform to the Abstract Model, even though are no approved encodings for even DC itself more than two years after the Abstract Model was released. OK then. Way to foster collaboration between metadata communities.

Ah, here’s this problem again: the DC group absolutely rightly argues that a model is essential for any real extensibility and interoperability. But the message for the value proposition of the DCMI Abstract Model per se is lost.

If I had a recommendation for the DCMI, it would be to drop the Abstract Model and use RDF. It would save a lot of technical and evangelism work. I’ve said this before, but the Abstract Model offers completely unconvincing value to me. It has a model that is essentially equivalent to RDF, and yet the claimed advantage that it has a non-RDF XML syntax. But the problem is that this syntax is even uglier and more complicated than RDF/XML!

Turning the focus to RDF does not solve Jenn’s issue, of course. The whole point is you agree on a model so that you can go your own way on all the details that really matter: what you call a book title, how you represent related items, and so forth. But with RDF, you get a rich infrastructure of technology and tools that goes way beyond the DCMI Abstract Model. That infrastructure includes OWL, RDF Schema … and GRDDL.

MODS has no model, and so cannot really be harmonized properly with anything. The solution there is to create an RDF version that can. But with GRDDL, you can just write an XSLT to map it your XML to RDF, and so get something like the best of both worlds: the flexibility to use your own XML, and to merge it with other kinds of metadata descriptions via a common model. It seems to me RDF + GRDDL is a better, more practical, solution to interoperability than the DCMI Abstract Model.

Ultimately the RDA/DC announcement will allow just this since it will include an RDF vocabulary, but the message is indeed a little confusing.