Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 512 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 527 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 534 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 570 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/cache.php on line 103 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/query.php on line 61 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/theme.php on line 1109 darcusblog » 2003 » November - geek tools and the scholar

Archive for November, 2003

RELAX NG, MODS, and Annotation

Posted in General on November 26th, 2003 by darcusb – Comments Off

RELAX NG brings the power of xml schema to the masses. A few months ago I did not even know how to write a DTD. When I needed a way to code my course syllabi this year, I decided to bite the bullet and learn RELAX NG. It didn’t take me long to become comfortable with the compact syntax, and to understand not just how easy RELAX NG is, but how powerful.

So, I’ve converted MODS to RELAX NG. My next step – only made with a fair bit of help from contributors to the relaxng-users list – was to figure out how to merge it with my own schema for bibliographic annotations. Because MODS allows for foreign namespaces in its extension element, I wanted to create my own custom schema that I could plug in so I could use it to drive editing in the fantastic new nxml mode for emacs (available here).

Schema files available at:

Example instance at:

For the notes I started with docbook-ish markup, but switched to the more compact element names of html (along with additional attributes) to keep markup less intrusive. I may change back; who knows.

If you create a file in nxml mode and load the mods-bn schema, you will get tag completion on the extension element for the notes schema. If nothing else, perhaps this shows the potential of RELAX NG. More importantly, it’ll help me get some work done over the long weekend (holiday here in the U.S.).

Comments welcome. I’ve not quite determined if the attribute-based approach to tag bibliographic references will work for these needs, or whether I need to switch to something like the new DocBook stuff that has just been tentatively approved.

MODS: The Future of XML Bibliographic Metadata?

Posted in General on November 24th, 2003 by darcusb – 3 Comments

I’ve come across this question in two different forms today. Raymond Yee asks:

Question: As someone interested in marking up bibliographic metadata in XML (or with the RdfSpec), what specification/standard should I use? I’ve not found good answers yet to that questions.

Similarly, someone asked on the RefDB list:

- Is there really no established standard for representing references in XML? By TEI, for example, or something linked to the Dublin Core…

I confess to not really understanding RDF, nor its applicability to bibliographic metadata, despite the best efforts of Steve Cayzer to explain it to me. This doesn”t mean I’m an RDF sceptic; it just means I don’t understand it.

As for an adequate XML bibliographic metadata standard:

IMHO MODS offers structurally-sound comprehensiveness, without being too unwieldy. I see nothing else out there that is suitable for the range of documents I store and cite. Coupled with the fact that it is backed up by the Library of Congress, and part of its comprehensive XML strategy, makes MODS a more-or-less no-brainer.

Is it perfect? No, of course not. It was designed for the library community, and there are things that are missing as a result. There is no obvious way to code publication status, for example. There is also no facility to code abbreviated names.

Still, rather minor critiques, and MODS has an extension element that allows adding markup from other namespaces. And the forthcoming (any day now) v3 of the schema adds a crucial new element for bibliographic markup that I have seen nowhere aside from the seemingly defunct BibX: “part.” It is used to describe details associated with so-called analyticals: book chapters, journal articles, web pages, etc. If nothing else, this structure from BibX will live on in MODS.

Ultimately, though, technology is social as much as anything, and the success of any standard depends on projects implementing it to solve real problems.

DocBook Citation Proposal Approved

Posted in General on November 20th, 2003 by darcusb – Comments Off

The TC gave tentative approval to our proposal to improve citation support in DocBook. While they weren’t fond of modifying the citation element itself, they agreed the addition of a new biblioref element would be a good thing. Great!


Posted in General on November 20th, 2003 by darcusb – Comments Off

Just as an experiment, I’ve put together a Relax NG schema that uses MODS for its metadata, and DocBook for its content. The schema – which is just a test – is defined like so:

include "docbook.rnc" { start = document } include "mods.rnc" document = element document { meta, content } meta = element meta { ModsSchema } content = element content { article | book | chapter }

Here is a minimal example instance: <document> <meta> <mods xmlns=""> <name type="personal"> <namePart type="given">Bruce</namePart> <namePart type="family">D'Arcus</namePart> </name> <titleInfo> <title>A Journal Article</title> </titleInfo> <genre>article-journal</genre> </mods> </meta> <content> <article> <section> <title>Introduction</title> <para>Article text.</para> </section> </article> </content> </document>

There are a few things missing from MODS compared to the DiVA Schema. One is a way to code the publication status of a document. Is it published, or unpublished? If the latter, is it in review, in press, etc.?

Also, I hope to see the Library of Congress release a Relax NG version of the MODS. The center of gravity in the XML world is shifting to Relax NG, it seems to me, at least in the free software world.

DiVA: XML Academic Publishing

Posted in General on November 20th, 2003 by darcusb – Comments Off

Interesting article from D-Lib about an XML-based document system out of Sweden. They defined their own schema that uses a DocBook content model, but a far richer metadata model that seems to be influenced by – among other things – Dublin Core, MODS, and the FRBR.

I believe Norm Walsh has talked about allowing the use of different metadata models in a future version of DocBook. This strikes me as a really good idea.

Documents, Presentations and “Profiling”

Posted in General on November 18th, 2003 by darcusb – Comments Off

I posted the following question on the DocBook-apps list:

I wonder if anyone has experimented with coding documents in such a way that slide presentations can be extracted from them? In other words, my (academic) documents are almost all of the basic structure:

intro literature case study conclusion

There might be a second-order sectioning within that as well, but that itself can only form the skeleton of a presentation. Any suggestions on how to code a document such that it can be virtually completely built directly from the document (as opposed to a separate “slides” file or whatever)? Maybe have “invisible” sections” somehow? In response, Thomas Gier pointed me to “profiling.”

This seems to be what I’m looking for. Not surprisingly, the attributes used to tag content are specific to computer documentation, so I may need to use a customization layer to handle this for my work.

Anyone out there have an xslt file to generate Keynote presentations?

RefDB and Open Source Development

Posted in General on November 16th, 2003 by darcusb – 2 Comments

For an academic in need of reference management, storing and retrieving references only covers one need. The other significant need is document formatting. This is a difficult task, as journals and publishers all have their individual styles.

AFAIK the only project to adequately deal with formatting bibliographic citations in XML documents is RefDB. For a long time I’ve been of the opinion that open source projects should adopt RefDB as their core storage and formatting engine, and focus on UI and other innovations.

RefDB is based on open source relational database storage, but has an abstraction layer that allows support for MySQL, PostgreSQL, and SQLite. It has a clean client-server design, and includes a Perl module that communicates with RefDB via its daemon; designed precisely to interface with a GUI. Apparently writing such modules – in PHP, or Java, or Objective-C – is trivial; roughly a day’s work. Moreover, the developer – Markus Hoenicka – likely knows more about formatting XML-based bibliographies and citations than anyone on the planet.

For whatever reasons, my argument has been generally rejected.

I have thus come to the conclusion that it might be a better approach to modularize the pieces of a successful bibliographic system. While RefDB has a functional web interface, it lacks some of the functionality I’d like to see:

  1. Support for rich annotation, with xslt rendering. Beyond rendering for the web, I’d like to click a button and have a pop-up with configurable transform: to LaTeX, DocBook, TEI … whatever.
  2. Hot-linked queries on author names, keywords, etc.
  3. Better record entry UI, which is currently just a text field in which you can paste RIS or BibTeX data. Ideal would be a full-blown configurable form UI modeled on MODS, but I could imagine also something more modest; perhaps a selector that would insert the proper XML skeleton based on the reference type?

I’d really like to see this implemented using the xslt-based approach of Syncato/Cocoon/Popoon, which could easily be adapted to different storage systems. So, some might use it with RefDB, others with a native XML DB solution, and still others with flat files.

Any takers? Alas, beyond basic xsl, I can’t code!

Cocoon/Popoon/Syncato and a Bibliographic Interface

Posted in General on November 16th, 2003 by darcusb – Comments Off

I’m quite intrigued by web UI technologies that take advantage of XML and XSLT. The most prominent example is Cocoon. The eXist native XML database project takes advantage of Cocoon. The project even provides a demo that presents bibliographic data in quite interesting ways, almost all implemented with xsl code. Indeed, I was easily able to modify it to handle data conforming to an alternative schema I’ve been working on.

Turns out the author of eXist – Wolfgang Meier – is a social scientist. He has also been involved in development of the SozioNet bibliographic metadata portal out of Germany, which includes an interesting form-based UI called the MetaWizard.

Popoon is PHP-based equivalent to Cocoon, and Syncato, is many ways a Python equivalent.

It seems to me all of these could provide a good basis on which to build a next generation web-based bibliographic interface.

New Syncato Web Admin UI

Posted in General on November 16th, 2003 by darcusb – Comments Off

Kimbro Staken – author of Syncato – is planning to add an enhanced web administration interface to the next release.

The screenshots show that it will have an entry and editing UI, with a checkbox list of categories. It will also have a table view to display all entries, their author, date, etc.

This is of course for managing a weblog. But is it just me, or is this not unlike a bibliographic record management UI?

In the weblog entry interface, for example, there is a “title” field and a “body” field. The first represents the metadata for the entry (I assume author and data is added behind the scenes), and the second the content. What if there was a way to plug in different metadata models? In my case, it’d be for bibliographic metadata. The “body” would then be notes attached to that record.

That, along with a way to authenticate access, would make a nice bibliographic management system.

XML and Bibliographic “Micro-Content”

Posted in General on November 13th, 2003 by darcusb – Comments Off

Earlier I posted an example of how xml markup can be used to encode meaning in bibliographic data. In the example I gave, I used it to attach metadata to a quoted excerpt from a work. This reflects my general interest in greatly enhancing annotation support in bibliographic systems.

How to apply to this to the world of GUIs? The new weblog system Syncato provides a hint, and Jon Udell gives an illustration of how one can use url-based xpath expressions to extract microcontent.

So what if instead of using such a system for posting on and querying a weblog, one used it to query and view bibliographic entries, complete with metadata enriched annotations? And what if the returned results themselves had hot-linked queries attached to particular information?

Say we have my earlier example, slightly modified to add additional markup:

<para>A <emphasis>paragraph</emphasis> with a <quote name="John Doe" name-id="doej" keywords="one two three">quote that includes <emphasis source="original">emphasis</emphasis></quote>.</para>
I have a simple search field in which I search for a keyword of “one,” which returns a record that includes the following:


A paragraph, with a quote that includes emphasis [source: John Doe; emphasis in original].

The link on the source name above is non-functional, but you could imagine it contained an expression that allowed the user to click on the link and get all results for “John Doe,” which in this case might just be a note attached to a name record.

Now imagine extending this to author names, to keywords, etc., etc.