Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 512 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 527 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 534 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 570 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/cache.php on line 103 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/query.php on line 61 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/theme.php on line 1109 darcusblog » 2006 » April - geek tools and the scholar

Archive for April, 2006

rBib Comments

Posted in Uncategorized on April 22nd, 2006 by darcusb – Comments Off

RefDB’s Markus Hoenicka has taken a shot at updating his RISX schema. To quote him on goals:

  • Preserve the scope of RIS. For the sake of compatibility all RIS reference types should be supported, and the new format should be able to hold all information contained in RIS entries.
  • Extend the scope of RIS where the latter is crippled. If there is a CHAP type and a BOOK type, then there must also be a SONG type along with the SOUND type.
  • Untangle the multiple-purpose fields like M1-M3,IS and the like. Only analogous information should be stored in the same elements/database fields. Define separate elements/database fields for unrelated information. With 200GB platters hitting the market, there is no need to fold unrelated stuff into the same field.
  • Sanitize the illogical A1-A3 and T1-T3 levels. These used to be a mix of the orthogonal concepts of “most likely to be asked for” on the one hand and the three-layer librarian approach on the other. Stick with the latter.
  • Drop the distinction between journals and other publications that contain parts. Turn each publication with parts into a separately citable entry. Do the same with sets which are composed of several publications.
  • Support relations between entries. “Is-part-of” is an obvious one, but we also might have “also-published-as” or “cited-after” relationships.
  • Provide validation during data entry. This is best done using an XML schema language.
  • Turn the schema into a data entry form. The schema should restrict data entry for each supported publication type in a way that you can’t enter information which is not useful for this type, and which would therefore not be stored in the database anyway.
  • Turn the schema into a database schema blueprint. It should be easy to deduce what information needs to be stored in order to support all reference types.

So the design does a few things to meet these goals:

  1. preserves the three-level structure of RIS
  2. opeens up some flexibliity in encoding the “publictation” level
  3. use RELAX NG to provide useful vaildation constraints not possible with DTDs

Rather than decontruct Markus’ goal and implementation choices in detail, let me instead offer an alternative I’ve been working on related to the work I was doing on an RDF schema. My goals:

  1. fully represent legacy data (like RIS)
  2. provide flexible relational capabilities, to capture the full range of citation data, including that common in the humanities and law
  3. be compatible with RDF and useable with Atom (because more-and-more reference data is shared, over the web, and with all the talk of OpenSearch and GData, Atom is clearly getting a lot of attention)
  4. have a RELAX NG representations that eases working with the format using XML tools

So similar goals, but I’m less interested in preserving legacy structures, and more interested in exploiting new opportunities around the semantic web.

To wit, an example. Of note:

  1. the file is a valid Atom feed
  2. content is namespaced to allow mixing
  3. the atom:entry element embeds the bibliographic metadata, and is trivially converted to RDF, and imported into RDBMSes
  4. it is easy to write a schema that allow these files to be authored and vadliated as standalone documents or as embedded content; indeed, I have
  5. typing is not flat, but tied to the proper relational structure. One does not have types like “journal article” or “conference paper” but rather an “article” published in a “journal,” and a “paper” presented at a “conference”

For comparison, both CiteULike and Ingenta offer RSS 1.0/RDF feeds of bibliographic data that are quite similar. Might be nice if we can all agree on a way forward.