The beginnings of a MODS-based bibliographic web application is now bundled with the daily snapshots of the eXist XML DB. It has a simple Google-like search interface in the sidebar, with some powerful XQuery-based search technology underneath.
Archive for July, 2004
Chris Putnam has released v3.7 of his bibutils bibliographic conversion utilities. Up until now, the suite of tools did a nice job of converting from BibTeX, RIS, and Endnote to MODS. Now it adds the ability to go from MODS back to those formats. What this means is it provides a sort of universal translator, allowing one to go, for example, from BibTeX to RIS by way of MODS.
One valuable feature in Endnote is the ability to search online library databases and download records almost as easily as doing a local search. Open source projects that look to provide reasonable alternatives to commercial applications need to provide such functionality.
Endnote’s functionality is based on the z39.50 protocol, and its server configuration files (there are thousands of them) are proprietary.
Thankfully, there’s a new standard that’s emerged that will ultimately supplant z39.50. It’s called SRW, and is built around XML. Not only does it retrieve XML records such as MODS, but it uses XML configuration files based on a standard called ZeeReX (my understanding is ZeeRex can be used with z39.50 too). It also has a purpose-built query language called CQL.
See also ZOOM.
I don’t understand the details of all of this, but at the OOoBib project, Rob Sanderson made a convincing argument awhile back not only to use this technology for remote searches, but to built the entire query code—including for local resources—around it. I find this quite intriguing myself. It seems fully consistent with the Bibliophile project’s interest in cross-database searching,
My last post seems to have stirred some controversy, which is good. My purpose is mainly that the next person that thinks
hmm .. maybe I ought to write an open source bibliographic application? will think twice about how to do so.
In comments, Mark Grimshaw wrote:
[A] programmer designing conversions from bibliographic databases for such styles (as wikindx does) HAS to know what type a particular resource is as the presentation of the resource entry for a particular style very often depends directly on the type of resource. A journal article is displayed quite differently to a newspaper article, to a chapter in a book, to an article on the web or to a proceedings article.
I’ve had the same discussion recently with Paul Tremblay, and let me get quickly to the point and say that this wrong. I put together a demonstration awhile back that shows my argument. Moreover, I recently put together an XSLT stylesheet that fairly successfully proves that it works.
A data model and formatting system that is based only on bibtex-like typing will fall down once it has to handle the needs of many scholars. Data will be inaccurate or vague and formatting styles will fail to format any record type not explicitly defined.
The solution is a system that has a rigorous generic fallback system based around structural class, and only secondarily on genre/type. This is nothing radical; it’s just an extension of the existing models in Endnote and Reference Manager.
update: ok, here’s a compromise example:
- use typing for main layout, but list of types is extensible in schema
- require definitions for article, book, and chapter, which serve as the generic fallbacks
- some logic then associates other record types with their appropriate fallback
- rendering of name roles and genre and media description is moved into separatee elements
This has the advantage of being more-or-less familiar and author-friendly, while also being quite flexible.
<citationstyle> <info> placeholder for metadata </info> <content> <name-roles> <role> <term>editor</term> <renderas> <single>Ed.</single> <multiple>Eds.</multiple> </renderas> </role> </name-roles> <genres> <genre> <term>dissertation</term> <renderas>PhD Dissertation</renderas> </genre> <genre> <term>letter</term> <renderas>letter</renderas> </genre> </genres> <media> <medium> <term>cdrom</term> <renderas>CD-ROM</renderas> </medium> </media> <citation> <author-year> <names> <firstname/> <middlename/> <lastname/> </names> <entry> <creator/> <year before=", "/> <point before=": "/> </entry> </author-year> </citation> <bibliography> <names> <firstname/> <middlename/> <lastname/> </names> <entry> <reftype name="book"> <creator/> <date before=" (" after=") "> <year/> </date> <title font-shape="italic" after=", "/> <origin> <place after=":"/> <publisher/> </origin> <physical-location before=", "/> <url before=", "/> </reftype> <reftype name="chapter"> <creator/> <date before=" (" after=") "> <year/> </date> <title/> <container before=", In "> <creator after=", "/> <title/> <origin after=", "> <place/> <publisher before=":"/> </origin> <part-details> <pages/> </part-details> </container> <physical-location/> <url/> </reftype> <reftype name="article"> <creator/> <date before=" (" after=") "> <year/> </date> <title before="“" after="”"/> <container> <title/> <origin/> <part-details> <volume/> <issue before="(" after=")"/> </part-details> </container> </reftype> </entry> </bibliography> </content> </citationstyle>
Two new open source bibliographic projects of note. The first is called TEI XML Bibliography Project, and comes from Paul Tremblay. Paul was originally aiming for something more ambitious (e.g. a new bibliographic schema), but I think our conversations convinced him it’d be better to focus on something less grand.
I’ve tried to convince him of the importance of BiblioX to bibliographic formatting in XML, but obviously failed. There are thousands of bibliographic styles in circulation, and it seems hopelessly unworkable to write individual XSLT stylesheets for each and every one of them, for each and every output format. While BiblioX still needs a lot of work, the basic principle of a document-and bibliographic-data-agnostic XSLT-based formatting system is sound.
I also stumbled on yet another new project called Bibliophile. This one is a little different in that it is not a standalone software project, but rather an effort at standardization.
Bibliophile is an initiative to align the development of bibliographic databases for the web. It aims to promote standards, discussion among users on necessary features and a variety of specific solutions for different fields of research.
I like this. It gets even more interesting when they say they’re standardizing on MODS.
However, it’s also a little disheartening to see so many projects (usually based on PHP and MySQL) reinventing the wheel over and over again, and often not very well. In particular, it’s one thing to support MODS for data exchange, but these projects really need rich internal data models up to the task of representing MODS data. And yet, they all seem to start and end with the BibTeX data model. Please, people, understand that BibTeX has a very limited—totally flat—data model that is not at all sufficient for scholars outside of the hard sciences, and which takes zero advantage of the power of relational databases and (and even more) XML.
So where to look instead for inspiration? RefDB already has a better data model than BibTeX, and is currently being revamped with a richer MODS-compatible data model (see here for some of the code). And someone is working on a PHP-based interface, which will likely ultimately be based on a PHP module similar to the current RefDB Perl module.
For something more radical, how about LibDB? LibDB is written by Perl hacker Morbus Iff, and is based on principles in the FRBR (pdf). Its SQL schema has separate tables for works, for people and their roles, for events, etc., and Morbus is seriously considering revamping it as a plug-in for the open source CMS system Drupal. While begun as a project to store videos, it is designed to store any bibliographic metadata (save, yet, for what librarians call “analyticals” – articles, book chapters, etc.).
update: David Wilson just pointed me to another potentially-interesting-but-flawed project that basis its data model on bibtex called B3.
Always up to interesting things, Hans Hagen is adding support for FO to his ConTeXt macro package. Nothing much to see yet, except a new listserv. At some point, perhaps this could be a good alternative to PassiveTeX.