Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 512 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 527 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 534 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 570 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/cache.php on line 103 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/query.php on line 61 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/theme.php on line 1109 darcusblog » 2008 » April - geek tools and the scholar

Archive for April, 2008

Google App Engine

Posted in Technology on April 8th, 2008 by darcusb – Comments Off

While Microsoft was busy this weekend sending threatening letters to Yahoo in an effort to buy a real presence on the web, Planet Python is awash this morning in news of Google’s new web app effort; and with damned good reason! This will be a quantum boost for Django, a framework that was already building steady momentum.

Now, if Google could just make it brain-dead easy to integrate such applications with Google Docs, then things might get really interesting.

Author Lists

Posted in Technology on April 6th, 2008 by darcusb – 5 Comments

As Fred and I are gearing up to finally release a formal first draft of the bibliographic ontology, one of the biggest decisions we need to make was about how to represent different kind of contributions. When you have a single book author, this is easy to do. But there are all kind of complicated real world examples that make this a difficult issue to resolve.

Let’s be concrete and look at an example from the journal Nature. We have here an article with 22 contributors. The list of contributors in turn has 12 notes attached to it, which for the most part indicate affiliation, but also group what seem to be primary authors. Finally, after the enumerated notes we have a note that indicates the corresponding author.

So the first question is, how does Nature represent this in a standard legacy format like RIS? Answer: they just have an ordered author list:

TY  - JOUR
AU  - Kleinman, Mark E.
AU  - Yamada, Kiyoshi
AU  - Takeda, Atsunobu
AU  - Chandrasekaran, Vasu
AU  - Nozaki, Miho
AU  - Baffi, Judit Z.
AU  - Albuquerque, Romulo J. C.
AU  - Yamasaki, Satoshi
AU  - Itaya, Masahiro
AU  - Pan, Yuzhen
AU  - Appukuttan, Binoy
AU  - Gibbs, Daniel
AU  - Yang, Zhenglin
AU  - Kariko, Katalin
AU  - Ambati, Balamurali K.
AU  - Wilgus, Traci A.
AU  - DiPietro, Luisa A.
AU  - Sakurai, Eiji
AU  - Zhang, Kang
AU  - Smith, Justine R.
AU  - Taylor, Ethan W.
AU  - Ambati, Jayakrishna

How to do this in a more relational model though; say a relational database, or RDF? Both of these are unordered models.

One option is to simply translate this directly to RDF:

<http://www.nature.com/nature/journal/v452/n7187/full/nature06765.html>
    a bibo:AcademicArticle ;
    dc:creator "Kleinman, Mark E." ;
    dc:creator "Yamada, Kiyoshi" ;
    dc:creator "Takeda, Atsunobu" ;
    dc:creator "Chandrasekaran, Vasu" ;
    dc:creator "Nozaki, Miho" ;
    dc:creator "Baffi, Judit Z." ;
    dc:creator "Albuquerque, Romulo J. C." ;
    dc:creator "Yamasaki, Satoshi" ;
    dc:creator "Itaya, Masahiro" ;
    dc:creator "Pan, Yuzhen" ;
    dc:creator "Appukuttan, Binoy" ;
    dc:creator "Gibbs, Daniel" ;
    dc:creator "Yang, Zhenglin" ;
    dc:creator "Kariko, Katalin" ;
    dc:creator "Ambati, Balamurali K." ;
    dc:creator "Wilgus, Traci A." ;
    dc:creator "DiPietro, Luisa A." ;
    dc:creator "Sakurai, Eiji" ;
    dc:creator "Zhang, Kang" ;
    dc:creator "Smith, Justine R." ;
    dc:creator "Taylor, Ethan W." ;
    dc:creator "Ambati, Jayakrishna" .

This is what Ingenta does in its RSS/RDF feeds. The problem here is that you lose order, and hence relative contribution. You also aren’t treating the authors as full objects, but just dumb strings. You can’t, for example, attach affiliation information to them.

Another option is an even more simple de-normalized form; a string with a delimited set of author names. In RDF, you’d basically join the creator strings into a single property.

This preserves order, but this doesn’t get you very far. From the data model perspective, the meaning of the data within that string is totally opaque. You can’t, for example, search based on author name within some programming gymnastics.

The more normalized form would represent the contributions explicitly. So, imagine a contributions table with foreign key references to both an “agents” or “contributors” table and to the “references” (or whatever) table, plus a foreign key reference to a “roles” table, and an integer column that track the “position” within the list. While more complex, this gives some additional advantages, such as being able to distinguish the first three on the list as primary authors, and the rest as secondary. In RDF, a fragment would be:

<http://www.nature.com/nature/journal/v452/n7187/full/nature06765.html>
    a bibo:AcademicArticle ;
    bibo:contribution [
        bibo:contributor [ foaf:name "Kleinman, Mark E." ] ;
        bibo:role bibo_roles:author ;
        bibo:position "1" 
       ]

This has been the agonizing part of designing the new bibliographic ontology. We’ve adopted the second approach by adding an explicit Contribution class. The approach gives a whole lot of flexibility, and maps well to a relational database.

But for legacy data and such, I’d expect some developers might want to use the de-normalized approach above. Thankfully, one can always do both. Triples are pretty cheap, after all, and using one form does not negate the other.

I do wonder, though, if perhaps we need to distinguish among different kinds of contribution, so as to make it easier to scope positions within different lists (primary-contributions vs. secondary-contributions, etc.).

And Kudos to Ed the Library of Congress

Posted in Technology on April 1st, 2008 by darcusb – Comments Off

OK, so yesterday I went on a rant about a recent Library of Congress technology decision. Today I’d like to call out an example of the sort of awesome work that also goes on there: an experimental server that serves up LoC subject headings using the SKOS vocabulary. Nice work Ed!

update: according to Rob Styles, this isn’t a Library of Congress sponsored effort, so I’m changing the title to reflect that. I hope people out there pick up on how important this sort of effort is, and encourage the Library of Congress to formally support it.