As Fred and I are gearing up to finally release a formal first draft of the bibliographic ontology, one of the biggest decisions we need to make was about how to represent different kind of contributions. When you have a single book author, this is easy to do. But there are all kind of complicated real world examples that make this a difficult issue to resolve.
Let’s be concrete and look at an example from the journal Nature. We have here an article with 22 contributors. The list of contributors in turn has 12 notes attached to it, which for the most part indicate affiliation, but also group what seem to be primary authors. Finally, after the enumerated notes we have a note that indicates the corresponding author.
So the first question is, how does Nature represent this in a standard legacy format like RIS? Answer: they just have an ordered author list:
TY - JOUR
AU - Kleinman, Mark E.
AU - Yamada, Kiyoshi
AU - Takeda, Atsunobu
AU - Chandrasekaran, Vasu
AU - Nozaki, Miho
AU - Baffi, Judit Z.
AU - Albuquerque, Romulo J. C.
AU - Yamasaki, Satoshi
AU - Itaya, Masahiro
AU - Pan, Yuzhen
AU - Appukuttan, Binoy
AU - Gibbs, Daniel
AU - Yang, Zhenglin
AU - Kariko, Katalin
AU - Ambati, Balamurali K.
AU - Wilgus, Traci A.
AU - DiPietro, Luisa A.
AU - Sakurai, Eiji
AU - Zhang, Kang
AU - Smith, Justine R.
AU - Taylor, Ethan W.
AU - Ambati, Jayakrishna
How to do this in a more relational model though; say a relational database, or RDF? Both of these are unordered models.
One option is to simply translate this directly to RDF:
a bibo:AcademicArticle ;
dc:creator "Kleinman, Mark E." ;
dc:creator "Yamada, Kiyoshi" ;
dc:creator "Takeda, Atsunobu" ;
dc:creator "Chandrasekaran, Vasu" ;
dc:creator "Nozaki, Miho" ;
dc:creator "Baffi, Judit Z." ;
dc:creator "Albuquerque, Romulo J. C." ;
dc:creator "Yamasaki, Satoshi" ;
dc:creator "Itaya, Masahiro" ;
dc:creator "Pan, Yuzhen" ;
dc:creator "Appukuttan, Binoy" ;
dc:creator "Gibbs, Daniel" ;
dc:creator "Yang, Zhenglin" ;
dc:creator "Kariko, Katalin" ;
dc:creator "Ambati, Balamurali K." ;
dc:creator "Wilgus, Traci A." ;
dc:creator "DiPietro, Luisa A." ;
dc:creator "Sakurai, Eiji" ;
dc:creator "Zhang, Kang" ;
dc:creator "Smith, Justine R." ;
dc:creator "Taylor, Ethan W." ;
dc:creator "Ambati, Jayakrishna" .
This is what Ingenta does in its RSS/RDF feeds. The problem here is that you lose order, and hence relative contribution. You also aren’t treating the authors as full objects, but just dumb strings. You can’t, for example, attach affiliation information to them.
Another option is an even more simple de-normalized form; a string with a delimited set of author names. In RDF, you’d basically join the creator strings into a single property.
This preserves order, but this doesn’t get you very far. From the data model perspective, the meaning of the data within that string is totally opaque. You can’t, for example, search based on author name within some programming gymnastics.
The more normalized form would represent the contributions explicitly. So, imagine a contributions table with foreign key references to both an “agents” or “contributors” table and to the “references” (or whatever) table, plus a foreign key reference to a “roles” table, and an integer column that track the “position” within the list. While more complex, this gives some additional advantages, such as being able to distinguish the first three on the list as primary authors, and the rest as secondary. In RDF, a fragment would be:
a bibo:AcademicArticle ;
bibo:contributor [ foaf:name "Kleinman, Mark E." ] ;
bibo:role bibo_roles:author ;
This has been the agonizing part of designing the new bibliographic ontology. We’ve adopted the second approach by adding an explicit Contribution class. The approach gives a whole lot of flexibility, and maps well to a relational database.
But for legacy data and such, I’d expect some developers might want to use the de-normalized approach above. Thankfully, one can always do both. Triples are pretty cheap, after all, and using one form does not negate the other.
I do wonder, though, if perhaps we need to distinguish among different kinds of contribution, so as to make it easier to scope positions within different lists (primary-contributions vs. secondary-contributions, etc.).